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NUCLEIC ACIDS AND PROTEINS OF 
C. ELEGANS INSULIN-LIKE GENES AND USES THEREOF 

This application is a continuation-in-part of copending U.S. application Serial No. 
5 09/084,303, filed May 26, 1998 which is a continuation-in-part of U.S. application Serial 
No. 09/074,984, filed May 8, 1998 which is a continuation-in-part of U.S. application Serial 
No. 09/062,580, filed April 17, 1998, each of which is incorporated by reference in its 
entirety. 

1 0 FIELD OF THE INVENTION 

The present invention relates to C. elegans insulin-like genes and methods for 
identifying insulin-like genes. The methods provide nucleotide sequences of C elegans 
insulin-like genes, amino acid sequences of their encoded proteins, and derivatives (e.g., 
fragments) and analogs thereof The invention further relates to fragments (and derivatives 

1 5 and analogs thereof) of insulin-like proteins which comprise one or more domains of an 
insulin-like protein. Antibodies to an insulin-like protein, and derivatives and analogs 
thereof, are provided. Methods of production of an insulin-like protein (e.g., by 
recombinant means), and derivatives and analogs thereof, are provided Methods fo identify 
the biological function of a C. elegans insulin-like gene are provided, including various 

20 methods for the functional modification (e.g., overexpression, underexpression, mutation- 
knock-out) of one gene, or of two or more genes simultaneously. Methods to identify a C. 
elegans gene which modifies the function of, and/or functions in a downstream pathway 
from, an insulin-like gene are provided. 

25 BACKGROUND OF THE INVENTION 

Insulin-like proteins are a large and widely-distributed group of structurally-related 
peptide hormones that have pivotal roles in controlling animal growlh, development, 
reproduction, and metabolism. At least five different subfamilies of insulin-like proteins 
have been identified in vertebrates, represented by insulin, insulin-like growth factor (IGF), 
30 relaxin, relaxin-like factor (RTF), and placentin (also known as early placenta insulin-like 
peptide, or ELIP). 

Insulin superfamily members in invertebrates have been less extensively analyzed 
than in vertebrates, but a number of different subgroups have been defined including 
molluscan insulin-related peptides (MIP-I to MIP-VII) (Smit et ah, 1988, Nature 331:535- 
35 538; Smit et al., 1995. Neuroscience 70:589-596), the bombyxins of lepidoptera (Kondo et 
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al., 1996, J. Mol. BioL 259:926-937), and the locust insulin-related peptide (LIRP) 
(Lagueux et al., 1990, Eur. J. Biochem. 187:249-254). More recently, putative orthologs of 
both vertebrate insulin and IGF have been identified in a tunicate (McRory and Sherwood, 
1997, DNA and Cell Biology 1 16:939-949). This is of significance since tunicates are 
5 thought to be the closest living invertebrate relative to the progenitor from which vertebrates 
evolved. 

Apparent homologs of the insulin receptor have been identified in both the fruit fly 
and the nematode (Petruzzelli et ah, 1986, Proc. Natl. Acad. Sci. U.S.A. 83:4710-4714; 
Kimuraetal., 1997, Science 277:942-946). An insulin receptor homolog has been 
10 characterized in Drosophila, termed DIR (Drosophila insulin receptor) (Ruan et al., 1995, J. 
Biol. Chem. 270:4236-4243), which exhibits extensive homology with vertebrate insulin 
and IGF receptors. 

Recent discoveries from studies of C. elegcms have also led to the identification of 
components involved in a presumptive insulin signaling pathway and have shown clear 

15 connections of this pathway to important aspects of metabolic regulation, (reviewed in 
Riddle and Albert, 1997, C elegans If Riddle et al., eds.. Cold Spring Harbor Press, 
Plainview, New York, pp. 739-768). Molecular cloning has revealed that the C elegans 
daf-2 gene, is a nematode homolog of vertebrate insulin receptors. A daf-2 mutant animal 
exhibits a dauer constitutive phenotype. The dauer stage is an alternative developmental 

20 stage that is induced when environmental factors are not adequate to promote successful 
reproduction in C elegans. Dauer larvae remain relatively motionless, stop feeding, have 
increased deposition of fat, remain small in size, and are reproductively immature 
(O'Riordan and Burnell, 1989, Comp. Biochem. Physiol. 92B:233-238). Two other genes, 
age-J and daf-16, have been placed in the same pathway as daf-2 based on analysis of 

25 genetic interactions (Morris et al., 1996, Nature 382:536-539; Ogg et al., 1997, Nature 
389:994-999; Lin et al., 1997, Science 278:1319-1322). 'The age- 1 gene encodes a 
nematode homolog of PI3K, and the action of age- 1 is required for the propagation of a daf- 
2 signal, in keeping with the role of PI3K in insulin signaling. Conversely, genetic analysis 
has shown that the normal role of daf-16 is one of blocking a signal generated by activated 

30 daf-2, and daf-16 has been found to encode a homolog of the HNF-3/forkhead family of 
transcription factors. 

There is another intriguing aspect to the phenotype of nematodes defective in 
components of the daf-2 pathway with respect to effects on the life-span of the organism 
(normally about 14 days). Mutations in daf-2 and age-J can more than double the life-span 

35 of animals, even under conditions that do not induce the formation of dauer larvae, and the 
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extension of life-span caused by daf-2 or age- J mutations requires the activity of the daf-16 
gene (Lin et aL 1997, Id. \ Tissenbaum and Ruvkun, 1998, Genetics 148:703-717; Larsen et 
al., 1995, Genetics 139:1567-1583). 

Kawano et al., February L 1998, Worm Breeder's Gazette 15(2), 47, disclose 

5 the sequences of the A and B chain of two C. elegans insulin-like proteins. Ruvkun et al. 
disclose the nucleotide and protein sequences of several C. elegans insulin-like genes (Int'l 
Publication No. WO 98/51351, Int'l Publication Date November 19, 1998 ). Genbank® 
Accession Numbers (in parentheses) corresponding to: for ZK75.1 (AAC 46744 & GI 
733563); ZK75.2 (AAC 46745 & GI 733561); ZK75.3 (AAC 46746 & GI 733562); 

10 ZK84.6 (AAC 48208 & GI 2914123); ZK1251.2 (CAA 92498 & GI 3881514); C17C3.4 
(AAB 52688 & GI 1086914); M04D8.2 (CAA 8361 1 & GI 3878561); M04D8.3 (CAA 
83609 &GI 3878559); F56F3.6 (CAA 83603 & GI 3877712); and T28B8.N (CAB 03444 
& GI 38803 1 7) disclose sequences that are not annotated as insulin-like genes. Citation of 
these references shall not be construed as an admission by applicant that they are available 

1 5 as prior art to the claimed invention. 

SUMMARY OF THE INVENTION 

The invention is directed to purified C. elegans insulin-like proteinn, or derivatives 
or fragments thereof that display one or more functional activities of a C elegans insulin- 

20 like proteina. The invention is also directed to compositions comprising such insulin-like 
protein or derivatives or fragments. The invention also concerns non-human animals 
comprising a transgene which encodes a C. elegans insulin-like protein. In preferred 
embodiments, the C. elegans insulin-like protein comprises an amino acid sequence selected 
from the group consisting of any one of SEQ ID NOs:l-18, 158-161, or 198-206. 

25 The invention also directed to nucleic acids encoding C. elegans insulin-like 

proteins, such as a nucleic acid comprising a nucleotide sequence selected from the group 
consisting of any one of SEQ ID NOs: 19-36, 162-165, and 207-215, or the complement 
thereof. 

The invention also concerns methods of analyzing insulin expression or mis- 
30 expression comprising observing a nematode for the effects of expression or mis-expression 
of a C elegans insulin-like protein, or derivative or fragment thereof that displays one or 
more functional activities of a C elegans animal, wherein said C. elegans insulin-like 
protein has an amino acid sequence selected from the group consisting of any one of SEQ 
IDNOs:l-18, 158-161, or 198-206. 

35 
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In preferred embodiments the C. clegans insulin-like protein is a member of 
Class IV. 

BRIEF DESCRIPTION OF THE DRAWINGS 

5 FIG. 1. Structural organization of precursor forms of the insulin superfamily of 

hormones are illustrated. The different domains that make up precursor forms of insulin- 
like hormones are represented as boxes labeled Pre, B ? C, A, D, and E, extending from the 
N-terminus (left) to the C-terminus (right) of the nascent polypeptide chain, respectively. 
Domains that may remain in a mature hormone are represented as unshaded boxes (the B, 
10 A, and D peptide domains) or as lightly hatched (the C or "connecting' 1 peptide domain). 
Domains that are removed during proteolytic processing are represented as shaded (the Pre 
peptide domain) or as hatched (the E peptide domain). IGF hormones are unique in having 
D and E peptide domains; these domains are represented as smaller boxes. Cleavage sites 
utilized by proteases during proteolytic processing (i.e., protein maturation) are indicated 
15 below the boxes. The asterisk marks the position of cleavage by signal peptidase. Arrows 
indicate cleavage sites by prohormone convertases. Disulfide bonds (S-S) are represented 
above the boxes with lines indicating connections between covalently-bonded Cys residues. 

FIG, 2. Conserved structural features of insulin superfamily members are shown, 
including aligned sequences of A and B peptide domains from diverse insulin superfamily. 
20 The alignment highlights the arrangement of conserved amino acid positions and their 

relationship to the overall folding pattern of the protein. The common helical regions found 
in the A and B chains are indicated by the symbol "<—>". 

FIG. 3. Alignment of the C. elegans insulin-like protein family. 
FIG. 4. Annotated sequence of C. elegans insulin-like protein F13B12.N and 
25 corresponding cDNA. 

FIG. 5. Annotated sequence of C. elegans insulin-like protein ZK75.1 and 
corresponding cDNA. 

FIG. 6. Annotated sequence of C. elegans insulin-like protein ZK75.2 and 
corresponding cDNA. 
30 FIG. 7. Annotated sequence of C elegans insulin-like protein ZK75.3 and 

corresponding cDNA. 

FIG. 8. Annotated sequence of C. elegans insulin-like protein ZK84.6 and 
corresponding cDNA. 

FIG. 9. Annotated sequence of C. elegans insulin-like protein ZK84.N2 and 
35 corresponding cDNA. 
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FIG. 10. Annotated sequence of C elegans insulin-like protein ZK1251.2 and 
corresponding cDNA. 

FIG. 1 1 . Annotated sequence of C. elegans insulin-like protein ZK1251 .N and 
corresponding cDNA. 

5 FIG. 12. Annotated sequence of C elegans insulin-like protein C06E2.N and 

corresponding cDNA. 

FIG. 13. Annotated sequence of C. elegans insulin-like protein C17C3.4 and 
corresponding cDNA. 

FIG. 14. Annotated sequence of C. elegans insulin-like protein C17C3.N and 
10 corresponding cDNA. 

FIG. 15. Annotated sequence of C. elegans insulin-like protein M04D8. 1 and 
corresponding cDNA. 

FIG. 16. Annotated sequence of C elegans insulin-like protein M04D8.2 and 
corresponding cDNA. 

15 FIG. 17. Annotated sequence of C elegans insulin-like protein M04D8.3 and 

corresponding cDNA. 

FIG. 18. Annotated sequence of C. elegans insulin-like protein ZK84.N and 
corresponding cDNA. 

FIG. 19. Annotated sequence of C. elegans insulin-like protein F56F3.6 and 
20 corresponding cDNA. 

FIG. 20. Annotated sequence of C elegans insulin-like protein T28B8.N and 
corresponding cDNA. 

FIG. 21 . Annotated sequence of C. elegans insulin-like protein ZC334.N and 
corresponding cDNA. 

25 FIG. 22. Annotated sequence of C. elegans insulin-like protein T08G5.N and 

corresponding cDNA. 

FIG. 23. Annotated sequence of C. elegans insulin-like protein F41G3.N and 
corresponding cDNA. 

FIG. 24. Annotated sequence of C. elegans insulin-like protein F41G3.N2 and 
30 corresponding cDNA. 

FIG. 25. Annotated sequence of C. elegans insulin-like protein C17C3.N2 and 
corresponding cDNA. 

FIG. 26. Annotated sequence of C. elegans insulin-like protein ZC334.N2 and 
corresponding cDNA. 

35 
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FIG. 27. Annotated sequence of C. elegans insulin-like protein ZC334.N3 and 
corresponding cDNA. 

FIG. 28. Annotated sequence of C. elegans insulin-like protein ZC334.N4 and 
corresponding cDNA. 

5 FIG. 29. Annotated sequence of C. elegans insulin-like protein ZC334.N5 and 

corresponding cDNA. 

FIG. 30. Annotated sequence of C elegans insulin-like protein ZC334.N6 and 
corresponding cDNA. 

FIG. 3 1 . Annotated sequence of C. elegans insulin-like protein ZC334.N7 and 
1 0 corresponding cDNA. 

FIG. 32A-32C. Annotated sequence of C elegans insulin-like protein T10D4.N and 
corresponding cDNA. 

FIG. 33. Annotated sequence of C. elegans insulin-like protein T10D4.N2 and 
corresponding cDNA. 

15 FIG. 34. Annotated sequence of C elegans insulin-like protein Y52A1.N and 

corresponding cDNA, 

DETAILED DESCRIPTION OF THE INVENTION 

In a desire to identify new and useful tools for probing the function and regulation of 

20 the insulin signaling pathway, an extensive search for insulin-like genes in the genome of C. 
elegans was conducted. The results of this search have revealed a surprisingly large and 
diverse family of insulin-like genes. These new insulin-like genes in C elegans constitute 
very useful tools for probing the function and regulation of their corresponding pathways. 
Systematic genetic analysis of signaling pathways involving insulin-like proteins in 

25 C elegans can be expected to lead to the discovery of new drug targets, therapeutic proteins, 
diagnostics and prognostics useful in the treatment of diseases and clinical problems 
associated with the function of insulin superfamily hormones in humans and other animals, 
as well as clinical problems associated with aging and senescence. Furthermore, analysis of 
these same pathways using C elegans insulin-like proteins as tools will have utility for 

30 identification and validation of pesticide targets in invertebrate pests that are components of 
these signaling pathways. 

Use of C. elegans insulin-like genes for such purposes has advantages over 
manipulation of other known components of the nematode daf-2 pathway, such as daf-2, 
daf-J6, and age-] . Use of ligand-encoding C. elegans insulin-like genes will provide a 

35 superior approach for identifying factors that are upstream of the receptor in the signal 
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transduction pathway. Specifically, components involved in the synthesis, activation and 
turnover of insulin-like proteins may be identified. Furthermore, the large number of 
different insulin-like hormones could provide a means to separate components involved in 
response to different, specific environmental signals which may not be technically feasible 
with manipulation of downstream components of the pathway found in target tissues. 
Further, the diversity of different insulin-like hormones may provide a means to identify 
new receptor and/or signal transduction systems for insulin superfamily hormones that are 
structurally different from those that have been characterized to date in either vertebrates or 
invertebrates. Finally, use of C. elegans as a system for analyzing the function and 
0 regulation of insulin-like genes has great advantages over approaches in other organisms 
due to the ability to rapidly carry out large-scale, systematic genetic screens as well as the 
ability to screen small molecule libraries directly on whole organisms for possible 
therapeutic or pesticide use. 

One advantage of investigating insulin-like genes in C. elegans comes from the 
5 tremendous progress made in the genome project for this organism. At the time of this 
writing, approximately 90% of the C elegans genome has been sequenced, and that data is 
publically available in GenBank®, as well as in a specialized database for the C. elegans 
genome referred to as ACEDB (i.e., A C. elegans Data Base) (Waterston and Sulston. 1995. 
"The genome of Caenorhabdiiis elegans", Proc. Natl. Acad. Sci. U.S.A. 92:10836-10840). 
0 In spite this wealth of genomic sequence information, the process of identifying authentic 
insulin superfamily genes in C. elegans is not trivial. 

There are a number of factors that made identifying insulin-like genes in C elegans 
genomic data particularly difficult. The insulin superfamily is fairly divergent at the 
sequence level and the degree of sequence homology between vertebrate and C. elegans 
5 insulin-like proteins is low. Furthermore, there are significant structural deviations in C 
elegans insulin-like proteins that are absent or not common in the well-characterized 
vertebrate insulin-like proteins. 

There are a number of software tools that can aid the process of identifying gene 
homologs in the C. elegans genome, including gene prediction programs (e.g., GeneFinder), 
0 sequence homology searching programs (e.g., BLAST, FASTA) and protein motif searching 
programs (e.g., Prosite, BLOCKS, Markoff models). Nonetheless, identifying insulin-like 
genes within the C. elegans genome posed a significant challenge that went beyond just the 
straightforward application of any of these programs, due to the level of sequence 
divergence and structural variation. These problems were confounded further by the fact 
5 that insulins are small genes whose coding regions are often divided into smaller exons. 
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Small genes and exons are the most difficult to reliably predict from genomic sequence data 
with gene finding programs, and small blocks of divergent sequence are difficult to identify 
with homology searching programs as authentic sequence matches over those that would 
occur by chance. 

5 The Prosite sequence matches found in the C. elegans genome illustrate the above- 

described problem. A pattern of specific amino acid residues has been derived from 
comparison of insulin superfamily proteins, termed an "insulin family signature," that 
reflects highly-conserved amino acid positions within the A chain of known insulin 
molecules. There are 27 matches to the Prosite "insulin family signature 1 ' identified in the 

10 C elegans genome sequence and listed in ACEDB. Subsequent searches and analysis of 
insulin-like genes has revealed that only five of the 27 Prosite matches correspond to 
authentic insulin-like genes (as judged by criteria described below). Furthermore, at least 
another 1 7 authentic insulin-like genes in C elegans did not have matches to the Prosite 
insulin family signature. 

1 5 Given the difficulties in identifying insulin-like genes in the C. elegans genome, we 

pursued a strategy of combining several tools to find and evaluate potential insulin 
superfamily genes. Our search strategy used sequence features of known insulin 
superfamily genes, but focused initially on identifying matches to either: ( 1) B peptide 
region alone; (2) A peptide region alone; or (3) B and A peptide sequences fused together 

20 [i.e., artificially). The A and B peptide regions (i.e. , domains) of known insulin superfamily 
proteins were chosen as queries since these are the most highly-conserved regions among 
the superfamily. The searching programs that were employed for the initial canvassing of 
the C. elegans genomic sequence included BLAST, FASTA, Markoff model searches, and 
exact pattern match searches (i.e., regular expression searches). For matches to the B or A 

25 peptide alone, the genomic sequence was examined manually, and with the aid of the 

GeneFinder program, to identify a plausible nearby region encoding the other peptide in the 
correct relative position (i.e., B peptide region N-terminal to A peptide region). 

In most cases, the B and A peptide matches did not form a continuous open reading 
frame in the genomic DNA, and so the sequence was examined manually, and with the aid 

30 of a GeneFinder program, for the presence of likely splice junctions that would join the 
presumptive B and A peptide coding regions in-frame. Coding sequences N-terminal to 
presumptive B peptide coding regions were further examined manually, and with the aid of 
the GeneFinder program, for extended coding regions that might have a characteristic signal 
sequence for secretion following an initiator methionine (Met) codon. Also, regions 

35 upstream of the presumptive B peptide were examined manually, and with the aid of the 
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GeneFinder program, for potential splice sites that might join these segments to rnRN A 

leaders found in trans-spliced mRNAs. 

Each genomic match with correctly-oriented B and A peptides was further evaluated 

as follows to confirm that these regions preserved most of the structural features that are 
5 important for the formation of the characteristic insulin secondary and tertiary structure: (1 ) 

number and spacing of Cys residues involved in inter-chain and intra-chain disulfide bonds; 

( 2) hydrophobic residues that form the "insulin core" at the interface of the A and B chains; 

(3 ) presence of Pro and Gly residues that promote characteristic breaks or turns between 

secondary' structure elements; and (4) presence of proteolytic processing signals for 
10 maturation of the prehormone, especially removal of a C peptide, or regions preceding the B 

peptide and following a secretory signal. 

This strategy resulted in the identification of at least 31 insulin-like genes. The 

structure and expression of the coding regions of 22 of these putative C. elegans insulin-like 

genes have been confirmed using an experimental approach involving reverse transcription 
15 of C. elegans mRNA, PCR amplification of specific cDNAs, cloning, and DNA sequencing. 

The details of the conditions used for each putative insulin-like gene are described in the 

Examples section below. Various non-limiting embodiments of the invention and 

applications and uses of these novel C. elegans insulin-like genes and protein* are described 

herein. 

20 In a preferred embodiment, the invention provides a method of analyzing an effect of 

expression or mis-expression of a C. elegans insulin-like gene comprising observing a first 
nematode genetically engineered to express or mis-express a C. elegans insulin-like protein 
of any one of groups I, II or IV, or a derivative or fragment thereof that displays one or more 
functional activities of the C. elegans insulin-like protein. In another specific embodiment, 

25 the C elegans protein is of group I. 

In yet another specific embodiment, the claimed methods and products do not 
involve the proteins or nucleic acids of SEQ ID NOs: 6, 12, 24, or 30. 

Isolation of C. elegans insulin-like genes 

30 The invention relates to the nucleotide sequences of C. elegans insulin-like nucleic 

acids. In one embodiment, the insulin-like nucleic acids encode an insulin-like protein 
comprising the sequence of any one of SEQ ID NOs: 1-1 8, 158-161. and 198-206. In 
another aspect, the invention provides a nucleic acid comprising a nucleotide sequence 
encoding at least a portion of an insulin-like protein, wherein the portion consists of at least 

35 5,6,7,8,9, 10, 11, 12, 13, 14, 15, 20, 25, 30, 60, or 100 continguous residues of any one of 
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SEQIDNOs: 1-18, 158-161, and 198-206. In a more specific embodiment, the nucleotide 
sequences comprise at least 8 continguous nucleotides (i.e., a hybridizable portion) of the 
cDNA sequences of any one of SEQ ID NOs: 19-36, 162-165, and 207-215. In a preferred 
aspect, the nucleic acid sequences encode a Class IV C. elegans insulin-like polypeptide 

5 having the structure of a Class IV polypeptide (as further described in Example 2 below), 
such as the polypeptide defined by the amino acid sequence of any one of SEQ ID NOs: 12- 
15, 18, or 198-203. Preferably, the nucleic acids consist of at least 10 (continguous) 
nucleotides, 25, nucleotides, 50 nucleotides, 100 nucleotides, 150 nucleotides, or 200 
nucleotides or 300 nucleotids of an insulin-like sequence, or a full-length insulin-like coding 

1 0 sequence. In another embodiment, a nucleic acids comprising at least a portion of a C. 
elegans insulin-like nucleic acid of the invention is smaller than 100, 200, 500, 10,000, 
15,000, 20,000 or 30,000 nucleotides in length. Nucleic acids can be single or double 
stranded. The invention also relates to nucleic acids hybridizable to or complementary to 
the foregoing sequences. In specific aspects, nucleic acids are provided which comprise a 

1 5 sequence complementary to at least 10, 25, 50, 100, or 200 nucleotides or the entire coding 
region of an insulin-like gene. 

nyuiiui/,auGii tunumun^ 

In a specific embodiment, a nucleic acid which is hybridizable to an insulin-like 

20 nucleic acid (e.g., having a sequence as set forth in SEQ ID NOs: 19-36, 162-165, and 207- 
215'), or to a nucleic acid encoding an insulin-like derivative, under conditions of low 
stringency is provided. By way of example and not limitation, procedures using such 
conditions of low stringency are as follows (see also Shilo and Weinberg, 1981, Proc. Natl. 
Acad. Sci. U.S.A. 78, 6789-6792). Filters containing DNA are pretreated for 6 h at 40 °C in 

25 a solution containing 35% formamide, 5X SSC, 50 mM Tris-HCl (pH 7.5), 5 mM EDTA, 
0.1% PVP, 0.1% Ficoll, 1% BSA, and 500 |ug/ml denatured salmon sperm DNA. 
Hybridizations are carried out in the same solution with the following modifications: 0.02% 
PVP, 0.02% Ficoll, 0.2% BSA, 100 ng/ml salmon sperm DNA, 10% (wt/vol) dextran 
sulfate, and 5-20 X 10 6 cpm 32 P-labeled probe is used. Filters are incubated in hybridization 

30 mixture for 1 8-20 h at 40°C, and then washed for 1 .5 h at 55 °C in a solution containing 2X 
SSC, 25 mM Tris-HCl (pH 7.4), 5 mM EDTA, and 0.1% SDS. The wash solution is 
replaced with fresh solution and incubated an additional 1.5 h at 60°C. Filters are blotted 
dry and exposed for autoradiography. If necessary, filters are washed for a third time at 
65-68 °C and re-exposed to film. Other conditions of low stringency which may be used are 

35 well known in the art (e.g., as employed for cross-species hybridizations). 
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In another specific embodiment, a nucleic acid which is hybridizable to an insulin- 
like nucleic acid under conditions of high stringency is provided. By way of example and 
not limitation, procedures using such conditions of high stringency are as follows. 
Prehybridization of filters containing DNA is carried out for 8 h to overnight at 65 °C in 

5 buffer composed of 6X SSC, 50 mM Tris-HCl (pH 7.5), 1 mM EDTA, 0.02% PVP, 0.02% 
Ficoll, 0.02% BSA, and 500 ng/ml denatured salmon sperm DNA. Filters are hybridized 
for 48 h at 65 °C in prehybridization mixture containing 100 ng/ml denatured salmon sperm 
DNA and 5-20 X 10 6 cpm of 32 P-labeled probe. Washing of filters is done at 37°C for 1 h 
in a solution containing 2X SSC, 0.01% PVP, 0.01% Ficoll, and 0.01% BSA. This is 

10 followed by a wash in 0.1 X SSC at 50 °C for 45 min before autoradiography. Other 
conditions of high stringency which may be used are well known in the art. 

In another specific embodiment, a nucleic acid which is hybridizable to an insulin- 
like nucleic acid under conditions of moderate stringency is provided. Selection of 
appropriate conditions for such stringencies is well known in the art (see e.g., Sarnbrook et 

15 al., 1989, Molecular Cloning, A Laboratory Manual, 2d Ed., Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New York; see also, Ausubel et al., eds., in the Current 
Protocols in Molecular Biology series of laboratory technique manuals, © 1987-1997 
Current Protocols, © 1994-1997 John Wiley and Sons, Inc.). 

Nucleic acids encoding derivatives and analogs of insulin-like proteins, and insulin- 

20 like antisense nucleic acids are additionally provided. As is readily apparent, as used herein, 
a "nucleic acid encoding a fragment or portion of an insulin-like protein" shall be construed 
as referring to a nucleic acid encoding only the recited fragment or portion of the insulin- 
like protein and not the other contiguous portions of the insulin-like protein as a continuous 
sequence. 

25 Fragments of insulin-like nucleic acids comprising regions conserved between (i.e., 

with homology to) other insulin-like nucleic acids, of the same or di fferent species, are also 
provided. Nucleic acids encoding one or more insulin-like protein domains are provided. 

Cloning procedures 

30 For expression cloning, an expression library can be constructed using known 

methods. For example, mRNA is isolated, cDNA is made and ligated into an expression 
vector (e.g., a bacteriophage derivative) such that it is capable of being expressed by the 
host cell into which it is then introduced. Various screening assays can then be used to 
select for the expressed insulin-like product. In one embodiment, anti-insulin-like 

35 antibodies can be used for selection. 
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In another embodiment, polymerase chain reaction (PCR) is used to amplify the 
desired sequence in a genomic or cDNA library, prior to selection. Oligonucleotide primers 
representing known insulin-like sequences can be used as primers in PCR. In a preferred 
aspect, the oligonucleotide primers represent at least part of conserved segments of strong 

5 homology between insulin-like genes of different species. The synthetic oligonucleotides 
may be utilized as primers to amplify sequences from a source (RNA or DNA), preferably a 
cDNA library, of potential interest. PCR can be carried out, e.g., by use of a Perkin-Elmer 
Cetus thermal cycler and Taq polymerase {e.g., Gene Amp™). The nucleic acid being 
amplified can include mRNA or cDNA or genomic DNA from any species. One may 

10 synthesize degenerate primers for amplifying homologs from other species in the PCR 
reactions. It is also possible to vary the stringency of hybridization conditions used in 
priming the PCR reactions, to allow for greater or lesser degrees of nucleotide sequence 
similarity between the known insulin-like nucleotide sequences and a nucleic acid homolog 
(or ortholog) being isolated. For cross species hybridization, low stringency conditions are 

15 preferred. For same species hybridization, moderately stringent conditions are preferred. 
After successful amplification of a segment of an insulin-like homolog, that segment may be 
cloned and sequenced by standard techniques, and utilized as a probe to isolate a complete 
cDNA. or genomic clone. This, in turn, permits the determination of the gene's complete 
nucleotide sequence, the analysis of its expression, and the production of its protein product 

20 for functional analysis, as described below. In this fashion, additional genes encoding 
insulin-like proteins and insulin-like analogs may be identified. 

The above-described methods are not meant to limit the following general 
description of methods by which clones of insulin-like genes may be obtained. 

Any eukaryotic cell potentially can serve as the nucleic acid source for molecular 

25 cloning of an insulin-like gene. The nucleic acid sequences encoding insulin-like proteins 
may be isolated from vertebrate, mammalian, human, porcine, bovine, feline, avian, equine, 
canine, as well as additional primate sources, insects {e.g., Drosophila), invertebrates (e.g., 
C. elegans), plants, etc. The DNA may be obtained by standard procedures known in the art 
from cloned DNA (e.g., a DNA "library"), by chemical synthesis, by cDNA cloning, or by 

30 the cloning of genomic DNA, or fragments thereof, purified from the desired cell (see e.g., 
Sambrook et al.; supra; Glover (ed.), 1985, DNA Cloning: A Practical Approach, MRL 
Press, Ltd., Oxford, U.K. Vol. L II.) Clones derived from genomic DNA may contain 
regulatory and intron DNA regions in addition to coding regions; clones derived from 
cDNA will contain only exon sequences. Whatever the source, the gene should be 

35 molecularly cloned into a suitable vector for propagation of the gene. 
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In the molecular cloning of the gene from genomic DNA, DNA fragments are 
generated, some of which will encode the desired gene. The DNA may be cleaved at 
specific sites using various restriction enzymes. Alternatively, one may use DNAse in the 
presence of manganese to fragment the DNA, or the DNA can be physically sheared, as for 
5 example, by sonication. The linear DNA fragments can then be separated according to size 
by standard techniques, such as agarose and polyacrylamide gel electrophoresis and column 
chromatography. 

Once the DNA fragments are generated, identification of the specific DNA fragment 
containing the desired gene may be accomplished in a number of ways. For example, if a 

10 portion of an insulin-like gene or its specific RNA or a fragment thereof is available and can 
be purified and labeled, the generated DNA fragments may be screened by nucleic acid 
hybridization to the labeled probe (e.g. Benton and Davis, 1977, Science 196:180). Those 
DNA fragments with substantial homology to the probe wall hybridize. It is also possible to 
identify the appropriate fragment by restriction enzyme digestion(s) and comparison of 

15 fragment sizes with those expected according to a known restriction map if such is 
available. Further selection can be carried out on the basis of the properties of the gene. 
Alternatively, the presence of the desired gene may be detected by assays based on the 
physical, chemical, or immunological properties of its expressed product. For example, 
cDNA clones, or DNA clones which hybrid-select the proper mRNAs, can be selected and 

20 expressed to produce a protein that has, e.g. , similar or identical electrophoretic migration, 
isoelectric focusing behavior, proteolytic digestion maps, hormonal activity, binding 
activity, or antigenic properties as known for an insulin-like protein. Using an antibody to a 
known insulin-like protein, other insulin-like proteins may be identified by binding of the 
labeled antibody to expressed putative insulin-like proteins, e.g., in an ELISA (enzyme- 

25 linked immunosorbent assay)-type procedure. Further, using a binding protein specific to a 
known insulin-like protein, other insulin-like proteins may be identified by binding to such a 
protein (see e.g., Clemmons, 1993, Mol. Reprod. Dev. 35:368-374; Loddick et al., 1998, 
Proc.Natl. Acad. Sci. U.S.A. 95:1894-1898). 

An insulin-like gene can also be identified by mRNA selection using nucleic acid 

30 hybridization followed by in vitro translation. In this procedure, fragments are used to 
isolate complementary mRNAs by hybridization. Such DNA fragments may represent 
available, purified insulin-like DNA of another species (e.g., Drosophila, mouse, human). 
Immunoprecipitation analysis or functional assays (e.g., aggregation ability in vitro, binding 
to receptor, etc.) of the in vitro translation products of the isolated products of the isolated 

35 mRNAs identifies the mRNA and, therefore, the complementary DNA fragments that 
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contain the desired sequences. In addition, specific mRNAs may be selected by adsorption 
of polysomes isolated from cells to immobilized antibodies specifically directed against 
insulin-like protein. A radiolabeled insulin-like cDNA can be synthesized using the 
selected mRNA (from the adsorbed polysomes) as a template. The radiolabeled mRNA or 

5 cDNA may then be used as a probe to identify the insulin-like DNA fragments from among 
other genomic DNA fragments. 

Alternatives to isolating the insulin-like genomic DNA include, chemically 
synthesizing the gene sequence itself from a known sequence or making cDNA to the 
mRNA which encodes the insulin-like protein. For example, RNA for cDNA cloning of the 

1 0 insulin-like gene can be isolated from cells which express the gene. 

The identified and isolated gene can then be inserted into an appropriate cloning 
vector. A large number of vector-host systems known in the art may be used. Possible 
vectors include plasmids or modified viruses, but the vector system must be compatible 
with the host cell used. Such vectors include bacteriophages such as lambda derivatives, or 

15 plasmids such as PBR322 or pUC plasmid derivatives or the Bluescript vector (Stratagene). 
The insertion into a cloning vector can, for example, be accomplished by ligating the DNA 
fragment into a cloning vector which has complementary cohesive termini. However, if the 
complementary restriction sites used to fragment the DNA are not present in the cloning 
vector, the ends of the DNA molecules may be enzymatically modified. Alternatively, any 

20 site desired may be produced by ligating nucleotide sequences (linkers) onto the DNA 

termini; these ligated linkers may comprise specific chemically synthesized oligonucleotides 
encoding restriction endonuclease recognition sequences. In an alternative method, the 
cleaved vector and an insulin-like gene may be modified by homopolymeric tailing. 
Recombinant molecules can be introduced into host cells via transformation, transfection, 

25 infection, electroporation, etc., so that many copies of the gene sequence are generated. 

In an alternative method, the desired gene may be identified and isolated after 
insertion into a suitable cloning vector in a "shot gun" approach. Enrichment for the desired 
gene, for example, by size fractionization, can be done before insertion into the cloning 
vector. 

30 In specific embodiments, transformation of host cells with recombinant DNA 

molecules that incorporate an isolated insulin-like gene, cDNA, or synthesized DNA 
sequence enables generation of multiple copies of the gene. Thus, the gene may be obtained 
in large quantities by growing transformants, isolating the recombinant DNA molecules 
from the transformants and, when necessary, retrieving the inserted gene from the isolated 

35 recombinant DNA. 
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The insulin-like sequences provided by the instant invention include those 
nucleotide sequences encoding substantially the same amino acid sequences as found in 
native insulin-like proteins, and those encoded amino acid sequences with functionally 
equivalent amino acids, as well as those encoding other insulin-like derivatives or analogs, 
5 as described below for insulin-like derivatives and analogs. 

Expression of C. elegans insulin-like genes 

The nucleotide sequence coding for an insulin-like protein or a functionally active 
analog or fragment or other derivative thereof, can be inserted into an appropriate 

1 0 expression vector, i.e. , a vector which contains the necessary elements for the transcription 
and translation of the inserted protein-coding sequence. The necessary transcriptional and 
translational signals can also be supplied by the native insulin-like gene and/or its flanking 
regions. A variety of host-vector systems may be utilized to express the protein-coding 
sequence such as mammalian cell systems infected with virus (e.g., vaccinia virus, 

1 5 adenovirus, etc.): insect cell systems infected with virus (e.g., baculovims); microorganisms 
such as yeast containing yeast vectors, or bacteria transformed with bacteriophage, DNA, 
plasmid DNA, or cosmid DNA. The expression elements of vectors vary in their strengths 
and specificities. Depending on the host-vector system utilized, any one of a number of 
suitable transcription and translation elements may be used. In yet another embodiment, a 

20 fragment of an insulin-like protein comprising one or more domains of the insulin-like 
protein is expressed. 

Any of the methods previously described for the insertion of DNA fragments into a 
vector may be used to construct expression vectors containing a chimeric gene consisting of 
appropriate transcriptional/translational control signals and the protein coding sequences. 

25 These methods may include in vitro recombinant DNA and synthetic techniques and in vivo 
recombinants (genetic recombination). Expression of a nucleic acid sequence encoding an 
insulin-like protein or peptide fragment may be regulated by a second nucleic acid sequence 
so that the insulin-like protein or peptide is expressed in a host transformed with the 
recombinant DNA molecule. For example, expression of an insulin-like protein may be 

30 controlled by any promoter/enhancer element known in the art. Promoters which may be 
used to control insulin-like gene expression include the SV40 early promoter region, the 
promoter contained in the 3' long terminal repeat of Rous sarcoma, the herpes thymidine 
kinase promoter, the regulatory sequences of the metallothionein gene; prokaryotic 
expression vectors such as the P-lactamase promoter, or the lac promoter; plant expression 

35 vectors comprising the nopaline synthetase promoter or the cauliflower mosaic virus 35S 
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RNA promoter, and the promoter of the photosynthetic enzyme ribulose biphosphate 
carboxylase; promoter elements from yeast or other fungi such as the Gal 4 promoter, the 
alcohol dehydrogenase promoter, phosphoglycerol kinase promoter, alkaline phosphatase 
promoter, and the following animal transcriptional control regions, which exhibit tissue 

5 specificity and have been utilized in transgenic animals: elastase I gene control region which 
is active in pancreatic acinar cells (Swift et ah, 1984, Cell 38:639-646); a gene control 
region which is active in pancreatic beta cells (Hanahan, 1985, Nature 315:1 15-122), an 
immunoglobulin gene control region which is active in lymphoid cells (Grosschedl et al., 
1984, Cell 38:647-658), mouse mammary tumor virus control region which is active in 

10 testicular, breast, lymphoid and mast cells (Leder et al., 1986, Cell 45:485-495), albumin 
gene control region which is active in liver (Pinkert et al., 1 987, Genes and Devel. 1 :268- 
276), alpha-fetoprotein gene control region which is active in liver (Krumlauf et al.. 1985, 
Mol. Cell. Biol. 5: 1 639-1 648); alpha 1 -antitrypsin gene control region which is active in the 
liver (Kelsey et al., 1987, Genes and Devel. 1 : 161 -171), beta-globin gene control region 

15 which is active in myeloid cells (Mogram et al., 1985, Nature 315:338-340); myelin basic 
protein gene control region which is active in oligodendrocyte cells in the brain (Readhead 
et al., 1987, Cell 48:703-712); myosin light chain-2 gene control region which is active in 
skeletal muscle (Sani, 1985, Nature 314:283-286), and gonadotropic releasing hormone 
gene control region which is active in the hypothalamus (Mason et ah, 1986, Science 

20 234:1372-1378). 

In a specific embodiment, a vector is used that comprises a promoter operably linked 
to an insulin-like gene nucleic acid, one or more origins of replication, and, optionally, one 
or more selectable markers (e.g., an antibiotic resistance gene). 

Expression constructs can be made by subcloning an insulin-like coding sequence 

25 into the EcoRI restriction site of each of the three pGEX vectors (Smith and Johnson, 1988, 
Gene 7:31-40). This allows for the expression of the insulin-like protein product from the 
subclone in the correct reading frame. 

Expression vectors containing insulin-like gene inserts can be identified by three 
general approaches: (a) nucleic acid hybridization; (b) presence or absence of "marker" gene 

30 functions; and (c) expression of inserted sequences. In the first approach, the presence of an 
insulin-like gene inserted in an expression vector can be detected by nucleic acid 
hybridization using probes comprising sequences that are homologous to an inserted 
insulin-like gene. In the second approach, the recombinant vector/host system can be 
identified and selected based upon the presence or absence of certain "marker" gene 

35 functions (e.g., thymidine kinase activity, resistance to antibiotics, transformation 
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phenotype, occlusion body formation in baculovirus, etc.) caused by the insertion of an 
insulin-like gene in the vector. For example, if the insulin-like gene is inserted within the 
marker gene sequence of the vector, recombinants containing the insulin-like insert can be 
identified by the absence of the marker gene function. In the third approach, recombinant 

5 expression vectors can be identified by assaying the insulin-like product expressed by the 
recombinant. Such assays can be based, for example, on the physical or functional 
properties of the insulin-like protein in in vitro assay systems, e.g., binding with 
anti-insuiin-like protein antibody. 

Once a particular recombinant DNA molecule is identified and isolated, several 

10 methods known in the art may be used to propagate it. Once a suitable host system and 
growth conditions are established, recombinant expression vectors can be propagated and 
prepared in quantity. Some of the expression vectors which can be used include human or 
animal viruses such as vaccinia virus or adenovirus; insect viruses such as baculovirus; 
yeast vectors; bacteriophage vectors (e.g., lambda phage), and plasmid and cosmid DNA 

15 vectors. 

In addition, a host cell strain may be chosen which modulates the expression of the 
inserted sequences, or modifies and processes the gene product in the specific fashion 
desired. Expression from certain promoters can be elevated in the presence of certain 
inducers; thus, expression of the genetically engineered insulin-like protein may be 

20 controlled. Furthermore, different host cells have characteristic and specific mechanisms 
for the translational and post-translational processing and modification (e.g., glycosylation, 
phosphorylation of proteins. Appropriate cell lines or host systems can be chosen to ensure 
the desired modification and processing of the foreign protein expressed. For example, 
expression in a bacterial system can be used to produce a non-glycosylated core protein 

25 product. Expression in yeast will produce a glycosylated product. Expression in 

mammalian cells can be used to ensure "native" glycosylation of a heterologous protein. 
Furthermore, different vector/host expression systems may effect processing reactions to 
different extents. 

In other embodiments of the invention, the insulin-like protein, fragment, analog, or 
30 derivative may be expressed as a fusion, or chimeric protein product (comprising the 

protein, fragment, analog, or derivative joined via a peptide bond to a heterologous protein 
sequence of a different protein). Such a chimeric product can be made by ligating the 
appropriate nucleic acid sequences encoding the desired amino acid sequences to each other 
by methods known in the art, in the proper coding frame, and expressing the chimeric 

35 
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product by methods commonly known in the art. Alternatively, such a chimeric product 
may be made by protein synthetic techniques, e.g., by use of a peptide synthesizer. 

Identification and purification of gene products 

5 The invention provides compositions comprising amino acid sequences of insulin- 

like proteins and fragments and derivatives thereof which comprise an antigenic determinant 
(i.e., can be recognized by an antibody) or which are otherwise functionally active, as well 
as nucleic acid sequences encoding the foregoing. "Functionally active" insulin-like 
material as used herein refers to that material displaying one or more functional activities 

10 associated with a full-length (wild-type) insulin-like protein, e.g., binding to an insulin-like 
receptor (e.g., daf-2) or insulin-like protein binding partner, antigenicity (binding to an anti- 
insulin-like protein antibody), immunogenicity, etc. The compositions may consist 
essentially of the insulin-like proteins and fragments and derivatives thereof. Alternatively, 
the insulin-like proteins and fragments and derivatives thereof may be a component of a 

1 5 composition that comprises other components, for example, a diluent such as saline, a 
pharmaceutical^ acceptable carrier or excipient, a culture medium, etc. 

In specific embodiments, the invention provides fragments of an insulin-like protein 
consisting of at least 6 amino acids, 10 amino acids, 20 amino acids, 50 amino acids, or of 
at least 75 amino acids. In other embodiments, the proteins comprise or consist essentially 

20 of an insulin-like B peptide domain, an insulin-like A peptide domain, an insulin-like C 
peptide domain, or any combination of the foregoing, of an insulin-like protein. Fragments, 
or proteins comprising fragments, lacking some or all of the foregoing regions of a insulin- 
like protein are also provided. Nucleic acids encoding the foregoing are provided. 

Once a recombinant which expresses the insulin-like gene sequence is identified, the 

25 gene product can be analyzed. This is achieved by assays based on the physical or 

functional properties of the product, including radioactive labeling of the product followed 
by analysis by gel electrophoresis, immunoassay, etc. The gene product may be isolated and 
purified by standard methods including chromatography (e.g., ion exchange, affinity, and 
sizing column chromatography), centrifugation, differential solubility, or by any other 

30 standard technique for the purification of proteins. The functional properties may be 

evaluated using any suitable assay. The amino acid sequence of the protein can be deduced 
from the nucleotide sequence of the chimeric gene contained in the recombinant. As a 
result, the protein can be synthesized by standard chemical methods known in the art (e.g., 
see Hunkapiller et ah, 1 984, Nature 310:105-111). 
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In an alternate embodiment, native insulin-like proteins can be purified from natural 
sources, by standard methods such as those described above (e.g., immunoaffinity 
purification). 

Insulin-like proteins, whether produced by recombinant DNA techniques or by 
5 chemical synthetic methods or by purification of native proteins, can include all or part of 
the amino acid sequence substantially as depicted in any of FIGs 4-36 (SEQ IDNOs:l-18, 
158-161, and 198-206), as well as fragments and other derivatives, and analogs thereof, 
including proteins homologous thereto. 

10 Structure of insulin-like genes and proteins 

The structure of insulin-like genes and proteins of the invention can be analyzed by 
various methods known in the art, including genetic analysis and protein analysis. 

Genetic analysis methods for determining the structure of cloned DNA or cDNA 
corresponding to an insulin-like include Southern hybridization. Northern hybridization, 

15 restriction endonuclease mapping, and DNA sequence analysis. Accordingly, this invention 
provides nucleic acid probes recognizing an insulin-like gene. For example, polymerase 
chain reaction followed by Southern hybridization with an insulin-like gene-specific probe 
can allow the detection of an insulin-like gene in DNA from various cell types. Methods of 
amplification other than PCR are commonly known and can also be employed. In one 

20 embodiment, Southern hybridization can be used to determine the genetic linkage of an 
insulin-like gene. Northern hybridization analysis can be used to determine the expression 
of an insulin-like gene. Various cell types, at various states of development or activity can 
be tested for insulin-like gene expression. The stringency of the hybridization conditions 
for both Southern and Northern hybridization can be manipulated to ensure detection of 

25 nucleic acids with the desired degree of relatedness to the specific insulin-like gene probe 
used. Modifications of these methods and other methods commonly known in the art can be 
used. 

Restriction endonuclease mapping can be used to roughly determine the genetic 
structure of an insulin-like gene. Restriction maps derived by restriction endonuclease 
30 cleavage can be confirmed by DNA sequence analysis. 

DNA sequence analysis can be performed by any techniques known in the art, such 
as the method of Maxam and Gilbert (1980, Meth. Enzymol. 65:499-560), the Sanger 
dideoxy method (Sanger et ah, 1977, Proc. Natl. Acad. Sci. U.S.A. 74:5463), the use of T7 
DNA polymerase (Tabor and Richardson, U.S. Patent No. 4,795,699), or use of an 
35 automated DNA sequenator (e.g., Applied Biosystems, Foster City, California). 
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The amino acid sequence of an insulin-like protein can be derived by deduction from 
the DNA sequence, or alternatively, by direct sequencing of the protein, e.g., with an 
automated amino acid sequencer. An insulin-like protein sequence can be further 
characterized by a hydrophilicity analysis (Hopp and Woods, 1981, Proc. Natl. Acad. Sci. 

5 U.S.A. 78:3824). A hydrophilicity profile can be used to identify the hydrophobic and 
hydrophilic regions of the insulin-like protein and the corresponding regions of the gene 
sequence which encode such regions. 

Secondary, structural analysis (Chou and Fasman, 1974, Biochemistry 13:222) can 
also be done, to identify regions of an insulin-like protein that assume specific secondary 

10 structures. 

Manipulation, translation, and secondary' structure prediction, open reading frame 
prediction and plotting, as well as determination of sequence homologies, can also be 
accomplished using computer software programs available in the art. 

Other methods of structural analysis include X-ray crystallography, nuclear magnetic 
1 5 resonance spectroscopy and computer modeling. 

Antibodies to insulin-like protein 

Insulin like nrotein or its fra nr me ntQ ^ ° ^ n incnilin-likp nrntein encoded bv a 
sequence of any of SEQ ID NOs:l-18, 158-161, and 198-206, or a subsequence thereof), or 

20 other derivatives, or analogs thereof, may be used as an immunogen to generate antibodies. 
Such antibodies include polyclonal, monoclonal, chimeric, single chain, Fab fragments, and 
an Fab expression library. In another embodiment, antibodies to a domain (e.g., an insulin- 
like receptor binding domain) of an insulin-like protein are produced. In a specific 
embodiment, fragments of an insulin-like protein identified as hydrophilic are used as 

25 immunogens for antibody production using art-known methods. Some examples of suitable 
techniques include methods which provides for the production of antibody molecules by 
continuous cell lines in culture; the production of monoclonal antibodies in germ-free 
animals (see e.g., PCT/US90/02545); the use of human hybridomas (Cole et aL 1983, Proc. 
Natl. Acad. Sci. U.S.A. 80:2026-2030); transforming human B cells with EBV virus in vitro 

30 (Cole et al., 1 985, in Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, pp. 77-96). 
Additionally, known techniques can be used for the production of "chimeric antibodies" 
(e.g. by splicing the genes from a mouse antibody molecule specific for an insulin-like 
protein together with genes from a human antibody molecule of appropriate biological 
activity), insulin-like-specific single chain antibodies; and Fab expression libraries (e.g. to 

35 allow rapid and easy identification of monoclonal Fab fragments with the desired specificity 
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for insulin-like proteins, derivatives, or analogs). The foregoing antibodies can be used 
against the insulin-like protein sequences described herein, e.g., for imaging these proteins, 
measuring levels thereof, in diagnostic methods, etc. 

5 Insulin-like proteins, derivatives and analogs 

The invention further relates to insulin-like proteins and derivatives, fragments and 
analogs thereof which can be encoded by the nucleic acids described above. The insulin- 
like proteins comprise the amino acid sequence of any one of SEQ ID NOs 1-18, 158-161, 
and 198-206. In another aspect, the invention provides a protein consisting of or 

10 comprising at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 20, 25, or 30 amino acid residues of 
any one of SEQ ID NOs: 1-18, 158-161, and 198-206. In a preferred aspect, the C elegans 
insulin-like polypeptide has the structure of a Class IV polypeptide (as further described in 
Example 2 below), such as the polypeptide defined by the amino acid sequence of any one 
of SEQ ID NOs:12-15, 18, or 198-203. In particular aspects, the proteins, derivatives, or 

15 analogs are of insulin-like proteins of animals, e.g., fly, frog, mouse, rat, pig, cow, dog, 
monkey, human, worm, or plant. 

In a specific embodiment, the derivative or analog is functionally active, i.e., 
capable of exhibiting one or more functional activities associated with a full-length, wild- 
type insulin-like protein. As one example, such derivatives or analogs which have the 

20 desired immunogenicity or antigenicity can be used in immunoassays, for immunization, for 
inhibition of insulin-like activity, etc. As another example, such derivatives or analogs 
which have the desired binding activity can be used for binding to the daf-2 gene product. 
As yet another example, such derivatives or analogs which have the desired binding activity 
can be used for binding to a binding protein specific for a known insulin-like protein (see 

25 e.g.,Clemmons, 1993, Mol. Reprod. Dev. 35:368-374; Loddick et al., 1998, Proc. Natl. 

Acad. Sci. U.S.A. 95:1894-1898). Derivatives or analogs that retain, or alternatively lack or 
inhibit, a desired insulin-like protein property-of-interest (e.g., binding to an insulin-like 
protein binding partner), can be used as inducers, or inhibitors, respectively, of such 
property and its physiological correlates. A specific embodiment relates to an insulin-like 

30 protein fragment that can be bound by an anti-insulin-like protein antibody. Derivatives or 
analogs of an insulin-like protein can be tested for the desired activity by procedures 
discussed herein and also those known in the art. 

Insulin-like derivatives can be made by altering insulin-like sequences by 
substitutions, additions (e.g., insertions) or deletions that provide for functionally equivalent 

35 molecules. Due to the degeneracy of nucleotide coding sequences, other DNA sequences 
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which encode substantially the same amino acid sequence as an insulin-like gene may be 
used in the practice of the present invention. These can include nucleotide sequences 
comprising all or portions of an insulin-like gene which is altered by the substitution of 
different codons that encode a functionally equivalent amino acid residue within the 

5 sequence, thus producing a silent change. Likewise, the insulin-like derivatives of the 
invention include, but are not limited to, those containing, as a primary amino acid 
sequence, all or part of the amino acid sequence of an insulin-like protein including altered 
sequences in which functionally equivalent amino acid residues are substituted for residues 
within the sequence resulting in a silent change. For example, one or more amino acid 

1 0 residues within the sequence can be substituted by another amino acid of a similar polarity 
which acts as a functional equivalent, resulting in a silent alteration. Substitutions for an 
amino acid within the sequence may be selected from other members of the class to which 
the amino acid belongs. For example, the nonpolar (hydrophobic) amino acids include 
alanine, leucine, isoleucine, valine, proline, phenylalanine, tryptophan and methionine. The 

1 5 polar neutral amino acids include glycine, serine, threonine, cysteine, tyrosine, asparagine, 
and glutamine. The positively charged (basic) amino acids include arginine, lysine and 
histidine. The negatively charged (acidic) amino acids include aspartic acid and glutamic 
acid. Such substitutions arc generally understood to be conservative substitutions. 

The invention also provides proteins consisting of or comprising a fragment of an 

20 insulin-like protein consisting of at least 6 (continguous) amino acids of the insulin-like 
protein. In other embodiments, the fragment consists of at least 10, at least 15, at least 20 
or at least 50 amino acids of the insulin-like protein. In specific embodiments, such 
fragments are not larger than 35, 100 or 200 amino acids. Derivatives or analogs of insulin- 
like proteins include those molecules comprising regions that are substantially homologous 

25 to an insulin-like protein or fragment thereof (e.g., in various embodiments, at least 60% or 
70% or 80% or 90% or 95% identity over an amino acid sequence of identical size or when 
compared to an aligned sequence in which the alignment is done by a computer homology 
program known in the art) or whose encoding nucleic acid is capable of hybridizing to a 
coding insulin-like gene sequence, under high stringency, moderate stringency, or low 

30 stringency conditions. 

The insulin-like derivatives and analogs of the invention can be produced by various 
methods known in the art. The manipulations which result in their production can occur at 
the gene or protein level. For example, a cloned insulin-like gene sequence can be modified 
by any of numerous strategies known in the art. The sequence can be cleaved at appropriate 

35 sites with restriction endonuclease(s), followed by further enzymatic modification if desired, 
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isolated, and Iigated in vitro. In the production of a modified gene encoding a derivative or 
analog of an insulin-like protein, care should be taken to ensure that the modified gene 
remains within the same translational reading frame as the native protein, uninterrupted by 
translational stop signals, in the gene region where the desired insulin-like protein activity is 
5 encoded. 

Additionally, an insulin-like nucleic acid sequence can be mutated in vitro or in vivo, 
to create and/or destroy translation, initiation, and/or termination sequences, or to create 
variations in coding regions and/or to form new restriction endonuclease sites or destroy 
preexisting ones, to facilitate further in vitro modification. Any technique for mutagenesis 

10 known in the art can be used, including but not limited to, chemical mutagenesis, in vitro 
site-directed mutagenesis, use of TAB® linkers (Pharmacia), etc. 

Manipulations of an insulin-like protein sequence may also be made at the protein 
level. Included within the scope of the invention are insulin-like protein fragments or other 
derivatives or analogs which are differentially modified during or after translation, e.g., by 

1 5 glycosylation, acetylation, phosphorylation, amidation, derivatization by known 

protecting/blocking groups, proteolytic cleavage, linkage to an antibody molecule or other 
cellular ligand, etc. Any of numerous chemical modifications may be carried out by known 
techniques, including but not limited to specific chemical cleavage by cyanogen bromide, 
trypsin, chymotrypsin, papain, V8 protease, NaBH 4 , acetylation, formylation, oxidation, 

20 reduction, metabolic synthesis in the presence of tunicamycin, etc. 

In addition, analogs and derivatives of an insulin-like protein can be chemically 
synthesized. For example, a peptide corresponding to a portion of an insulin-like protein 
which comprises the desired domain, or which mediates the desired activity in vitro, can be 
synthesized by use of a peptide synthesizer. Furthermore, if desired, nonclassical amino 

25 acids or chemical amino acid analogs can be introduced as a substitution or addition into the 
insulin-like sequence. Non-classical amino acids include the D-isomers of the common 
amino acids, a-amino isobutyric acid, 4-aminobutyric acid, Abu, 2-amino butyric acid, 
y-Abu, e-Ahx, 6-amino hexanoic acid, Aib, 2-amino isobutyric acid, 3-amino propionic 
acid, ornithine, norleucine, norvaline, hydroxyproline, sarcosine, citrulline, cysteic acid, t- 

30 butylglycine, t-butylalanine, phenylglycine, cyclohexylalanine, p-alanine, fluoro-amino 
acids, designer amino acids such as p-methyl amino acids, Ca-methyl amino acids, Na- 
methyl amino acids, and amino acid analogs in general. Furthermore, the amino acid can be 
D (dextrorotary ) or L (levorotary). 

Chimeric or fusion proteins can be made comprising an insulin-like protein or 

35 fragment thereof (preferably consisting of at least a domain or motif of the insulin-like 
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protein, or at least 6, and preferably at least 10 amino acids of the insulin-like protein) 
joined at its amino- or carboxy-terminus via a peptide bond to an amino acid sequence of a 
different protein. Such a chimeric protein can be produced by any known method, 
including: recombinant expression of a nucleic acid encoding the protein (comprising an 

5 insulin-like-coding sequence joined in-frame to a coding sequence for a different protein); 
ligating the appropriate nucleic acid sequences encoding the desired amino acid sequences 
to each other in the proper coding frame, and expressing the chimeric product; and protein 
synthetic techniques, e.g.. by use of a peptide synthesizer. 

The insulin-like derivative can be a molecule comprising a region of homology with 

1 0 a insulin-like protein. For example, a first protein region can be considered "homologous" 
to a second protein region when the amino acid sequence of the first region is at least 30%, 
40%, 50%, 60%, 70%, 75%. 80%, 90%, or 95% identical, when compared to any sequence 
in the second region of an equal number of amino acids as the number contained in the first 
region or when compared to an aligned sequence of the second region that has been aligned 

15 by a computer homology program known in the art. For example, a molecule can comprise 
one or more regions homologous to an insulin-like domain or a portion thereof. 

A fragment of an insulin-like protein can be those fragments in the respective 
insuiin-iike proteins uf the invention most homologous to specific fragments of a human or 
mouse insulin-like protein as identified by protein analysis methods. 

20 Insulin-like fragments and derivatives of such fragments, may comprise or consist of 

one or more domains of an insulin-like protein, such as an insulin-like B peptide domain, an 
insulin-like A peptide domain, and/or an insulin-like connecting (C) peptide domain (or 
functional portion thereof). In particular examples, the insulin-like protein derivatives has 
either an A peptide domain or a B peptide domain. Such a protein may retain such domains 

25 separated by a peptide spacer. The spacer may be the same as or different from an insulin- 
like connecting (C) peptide. 

A insulin-like protein derivative may comprises one or more domains (or functional 
portion(s) thereof) of an insulin-like protein, and a one or more mutant domains(£?.g., due to 
deletion or point mutation(s)) of an insulin-like protein (e.g., such that the mutant domain 

30 has decreased function). 



35 
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Proteins which interact with insulin-like proteins 

The present invention further provides methods of identifying or screening for 
proteins which interact with C. elegans insulin-like proteins, or derivatives, fragments or 
analogs thereof. A preferred method is a yeast two hybrid assay system or a variation 

5 thereof. The yeast two-hybrid method has been used to analyze IGF- 1 -receptor interactions 
(see Zhu and Kahn. 1997, Proc. Natl. Acad. Sci. U.S.A. 94, 13063-13068). Derivatives 
(e.g., fragments) and analogs of a protein can also be assayed for binding to a binding 
partner by any method known in the art, for example, immunoprecipitation with an antibody 
that binds to the protein in a complex followed by analysis by size fractionation of the 

10 immunoprecipitated proteins (e.g., by denaturing polyacrylamide gel electrophoresis), 
Western analysis, non-denaturing gel electrophoresis, etc. 

Known methods can be used for assaying and screening fragments, derivatives and 
analogs of C elegans insulin-like protein interacting proteins (for binding to a C. elegans 
insulin-like peptide). Derivatives, analogs and fragments of proteins that interact with a C. 

1 5 elegans insulin-like protein can be identified by means of a yeast two hybrid assay system 
(Fields and Song, 1989, Nature 340:245-246 and U.S. Patent No. 5,283,173). Because the 
interactions are screened for in yeast, the intermolecular protein interactions detected in this 
svstem occur under physiological conditions that mimic the conditions in mammalian cells. 
This feature facilitates identification of proteins capable of interaction with a C elegans 

20 insulin-like protein from species other than C. elegans. 

Identification of interacting proteins by the improved yeast two hybrid system is 
based upon the detection of expression of a reporter gene, the transcription of which is 
dependent upon the reconstitution of a transcriptional regulator by the interaction of two 
proteins, each fused to one half of the transcriptional regulator. The "bait" (i.e., C elegans 

25 insulin-like protein or derivative or analog thereof) and "prey" proteins (proteins to be tested 
for ability to interact with the bait) are expressed as fusion proteins to a DNA binding 
domain, and to a transcriptional regulatory domain, respectively, or vice versa. In various 
specific embodiments, the prey has a complexity of at least about 50, about 100, about 500, 
about 1,000, about 5,000, about 10,000, or about 50,000; or has a complexity in the range of 

30 about 25 to about 100,000, about 100 to about 100,000, about 50,000 to about 100,000, or 
about 100,000 to about 500,000. For example, the prey population can be one or more 
nucleic acids encoding mutants of a protein (e.g., as generated by site-directed mutagenesis 
or another method of making mutations in a nucleotide sequence). Preferably, the prey 
populations are proteins encoded by DNA, e.g., cDNA or genomic DNA or synthetically- 

35 generated DNA. For example, the populations can be expressed from chimeric genes 
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comprising cDNA sequences from an un-characterized sample of a population of cDNA 
from mRNA. In one embodiment recombinant biological libraries expressing random 
peptides can be used as the source of prey nucleic acids. 

The invention provides methods of screening for inhibitors or enhancers of the 

5 protein interactants identified herein. Briefly, the protein-protein interaction assay can be 
carried out as described herein, except that it is done in the presence of one or more 
candidate molecules. An increase or decrease in reporter gene activity relative to that 
present when the one or more candidate molecules are absent indicates that the candidate 
molecule has an effect on the interacting pair. In a preferred method, inhibition of the 

10 interaction is selected for (i.e., inhibition of the interaction is necessary for the cells to 

survive), for example, where the interaction activates the URA3 gene, causing yeast to die in 
medium containing the chemical 5-fluoroorotic acid (Rothstein, 1983, Meth. Enzymol. 
101:167-180). The identification of inhibitors of such interactions can also be 
accomplished, for example, using competitive inhibitor assays, as described above. 

1 5 In general, proteins of the bait and prey populations are provided as fusion 

(chimeric) proteins (preferably by recombinant expression of a chimeric coding sequence) 
comprising each protein contiguous to a pre-selected sequence. For one population, the pre- 
selected sequence is a DNA binding domain. The DNA binding domain can be any DNA 
binding domain, as long as it specifically recognizes a DNA sequence within a promoter. 

20 For example, the DNA binding domain is of a transcriptional activator or inhibitor. For the 
other population, the pre-selected sequence is an activator or inhibitor domain of a 
transcriptional activator or inhibitor, respectively. The regulatory domain alone (i.e. not as a 
fusion to a protein sequence) and the DNA-binding domain alone preferably do not 
detectably interact (so as to avoid false positives in the assay). The assay system further 

25 includes a reporter gene operably linked to a promoter that contains a binding site for the 
DNA binding domain of the transcriptional activator (or inhibitor). Accordingly, in the 
present method of the present invention, binding of a C elegans insulin-like fusion protein 
to a prey fusion protein leads to reconstitution of a transcriptional activator (or inhibitor) 
which activates (or inhibits) expression of the reporter gene. The activation (or inhibition) 

30 of transcription of the reporter gene occurs intracellular^, e.g., in prokaryotic or eukaryotic 
cells, preferably in cell culture. 

The promoter that is operably linked to the reporter gene nucleotide sequence can be 
a native or non-native promoter of the nucleotide sequence, and the DNA binding site(s) 
that are recognized by the DNA binding domain portion of the fusion protein can be native 

35 to the promoter (if the promoter normally contains such binding site(s)) or non-native to the 
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promoter. Thus, for example, one or more tandem copies (e.g., four or five copies) of the 
appropriate DNA binding site can be introduced upstream of the TATA box in the desired 
promoter (e.g., in the area of about position -100 to about -400). In a preferred aspect, 4 or 
5 tandem copies of the 1 7 bp UAS (GAL4 DNA binding site) are introduced upstream of 

5 the TATA box in the desired promoter, which is upstream of the desired coding sequence 
for a selectable or detectable marker. In a preferred embodiment, the GAL1-10 promoter is 
operably fused to the desired nucleotide sequence; the GAL1-10 promoter already contains 5 
binding sites for GAL4. 

Alternatively, the transcriptional activation binding site of the desired gene(s) can be 

10 deleted and replaced with GAL4 binding sites (Bartel et aL, 1993, BioTechniques 14:920- 
924, Chasman et aL, 1989, Mol. Cell. Biol. 9:4746-4749). The reporter gene preferably 
contains the sequence encoding a detectable or selectable marker, the expression of which is 
regulated by the transcriptional activator, such that the marker is either turned on or off in 
the cell in response to the presence of a specific interaction. Preferably, the assay is carried 

15 out in the absence of background levels of the transcriptional activator (e.g., in a cell that is 
mutant or otherwise lacking in the transcriptional activator). More than one reporter gene 
can be used to detect transcriptional activation, e.g., one reporter gene encoding a detectable 
marker and one or more reporter genes encoding different selectable markers. The 
detectable marker can be any molecule that can give rise to a detectable signal, e.g., a 

20 fluorescent protein or a protein that can be readily visualized or that is recognizable by a 
specific antibody. The selectable marker can be any protein molecule that confers the 
ability to grow under conditions that do not support the growth of cells not expressing the 
selectable marker, e.g., the selectable marker is an enzyme that provides an essential 
nutrient and the cell in which the interaction assay occurs is deficient in the enzyme and the 

25 selection medium lacks such nutrient. The reporter gene can either be under the control of 
the native promoter that naturally contains a binding site for the DNA binding protein, or 
under the control of a heterologous or synthetic promoter. 

The activation domain and DNA binding domain used in the assay can be from a 
wide variety of transcriptional activator proteins, as long as these transcriptional activators 

30 have separable binding and transcriptional activation domains. For example, the GAL4 
protein of S. cerevisiae (Ma et al., 1987, Cell 48:847-853), the GCN4 protein of S. 
cerevisiae (Hope and Struhl, 1986, Cell 46:885-894), the ARD1 protein of S. cerevisiae 
(Thukral et al., 1989, Mol. Cell. Biol. 9:2360-2369), and the human estrogen receptor 
(Kumar et al., 1987, Cell 51:941-951), have separable DNA binding and activation 

35 domains. The DNA binding domain and activation domain that are employed in the fusion 
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proteins need not be from the same transcriptional activator. In a specific embodiment, a 
GAL4 or LEXA DNA binding domain is employed. In another specific embodiment, a 
GAL4 or herpes simplex virus VP16 (Triezenberg et al., 1988, Genes Dev. 2:730-742) 
activation domain is employed. In a specific embodiment, amino acids 1-147 of GAL4 (Ma 

5 et al., 1987, Cell 48:847-853; Ptashne et al., 1990, Nature 346:329-331) is the DNA binding 
domain, and amino acids 41 1-455 of VP 16 (Triezenberg et al., 1988, Genes Dev. 2:730- 
742; Cress et al., 1991, Science 251:87-90) comprise the activation domain. 

In a preferred embodiment, the yeast transcription factor GAL4 is reconstituted by 
protein-protein interaction and the host strain is mutant for GAL4. In another embodiment, 

10 the DNA-binding domain is Ace IN and/or the activation domain is Acel, the DNA binding 
and activation domains of the Acel protein, respectively. Acel is a yeast protein that 
activates transcription from the CUP J operon in the presence of divalent copper. CUP1 
encodes metallothionein, which chelates copper, and the expression of CUP 1 protein allows 
growth in the presence of copper, which is otherwise toxic to the host cells. The reporter 

1 5 gene can also be a CUPl-lacZ fusion that expresses the enzyme beta-galactosidase 
(detectable by routine chromogenic assay) upon binding of a reconstituted AcelN 
transcriptional activator (see Chaudhuri et al., 1995, FEBS Letters 357:221-226). In another 
embodiment, the DNA binding domain of the human estrogen receptor is used, with a 
reporter gene driven by one or three estrogen receptor response elements (Le Douarin et al., 

20 1995, NucL Acids. Res. 23:876-878), 

The DNA binding domain and the transcriptional activator/inhibitor domain each 
preferably has a nuclear localization signal (see Ylikomi et al., 1992, EMBO J. 1 1:3681- 
3694, Dingwall and Laskey, 1991, TIBS 16:479-481) functional in the cell in which the 
fusion proteins are to be expressed. 

25 To facilitate isolation of the encoded proteins, the fusion constructs can further 

contain sequences encoding affinity tags such as glutathione-S-transferase or maltose- 
binding protein or an epitope of an available antibody, for affinity purification (e.g., binding 
to glutathione, maltose, or a particular antibody specific for the epitope, respectively) (Allen 
et al., 1995, TIBS 20:51 1-516). In another embodiment, the fusion constructs further 

30 comprise bacterial promoter sequences for recombinant production of the fusion protein in 
bacterial cells. 

The host cell in which the interaction assay occurs can be any cell, prokaryotic or 
eukaryotic, in which transcription of the reporter gene can occur and be detected such as 
mammalian (e.g., monkey, mouse, rat, human, bovine), chicken, bacterial, or insect cells, 
35 and is preferably a yeast cell. Expression constructs encoding and capable of expressing the 
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binding domain fusion proteins, the transcriptional activation domain fusion proteins, and 
the reporter gene product(s) are provided within the host cell, by mating of cells containing 
the expression constructs, or by cell fusion, transformation, electroporation, microinjection, 
etc. When the assay is carried out in mammalian cells (e.g., hamster cells, HeLa cells), the 
5 DNA binding domain can be the GAL4 DNA binding domain, the activation domain can be 
the herpes simplex virus VP 16 transcriptional activation domain, and the reporter gene can 
contain the desired coding sequence operably linked to a minimal promoter element from 
the adenovirus E1B gene driven by several GAL4 DNA binding sites (see Fearon et al., 

1992, Proc. Natl. Acad. Sci. U.S.A. 89:7958-7962). The host cell used should not express 
10 an endogenous transcription factor that binds to the same DNA site as that recognized by the 

DNA binding domain fusion population. Also, preferably, the host cell is mutant or 
otherwise lacking in an endogenous, functional form of the reporter gene(s) used in the 
assay. 

Various vectors and host strains for expression of the two fusion protein populations 
15 in yeast are known and can be used (see e.g., U.S. Patent No. 5,1468,614; Bartel et al., 

1993, Cellular Interactions in Development, Hartley, ed., Practical Approach Series xviii, 
IRL Press at Oxford University Press. New York. NY, pp. 153-179; Fields and Sternglanz, 

1994, Trends In Genetics 10:286-292). Any yeast strain or derivative strains made 
therefrom, known in the art can be used including N105, N 106. N 1051, N 1061, and YULH. 

20 Other exemplary strains that can be used in the assay of the invention also include: 

Y190: MATa, ura3-52, his3-20Q, Iys2-8()L ade2-101, trpl-901, Ieu2-3,1J2, gal4a, 

gal80a, cyh r 2, L YS2 : : GALl UAS -HIS3 TA rA HIS3 , URA3::GALl UA ^GALl TAT AacZ\ Harper et al., 

1993, Cell 75:805-816, available from Clontech, Palo Alto, CA,. Y190 contains HIS3 and 

lacZ reporter genes driven by GAL4 binding sites. 
25 CG-1945: MATa, ura3~52, his3-200, lys2~801, ade2-10L trpl-901. leu2-3, 112, 

gal4-542, gal80-538, cyU2, L YS2: : GALl ( !AS -I11S3 TA TA HIS3 , URA3: :GALl r/AS/ 7mctsM) - 

CYCl TAlC lacZ, available from Clontech, Palo Alto, CA. CG-1945 contains HIS3 and lacZ 

reporter genes driven by GAL4 binding sites. 

Y187: MAT-a. ura3-52, his3-200, ade2-10L trpl-901, leu2-3,l 12, gaUa, gal80a, 
30 URA3::GAL1 VAr GALl IAlA -lacZ, available from Clontech, Palo Alto, CA. Y187 contains a 

lacZ reporter gene driven by GAL4 binding sites. 

SFY526: MATa, ura3-52. his3-200, Iys2-80L ade2~10L trpl-901, Ieu2-3J12, 

g al4-542, gal80-538, can\ URA3::GAL14acZ, available from Clontech, Palo Alto, CA. 

SFY526 contains H1S3 and lacZ reporter genes driven by GAL4 binding sites. 
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HF7c: MATa, ura3-52, his3-200, lys2-801, ade2-10U trpl-901, Ieu2-3J12 9 gal4- 
542,gal80-538, LYS2::GAL1-HJS3, LIRA 3: :GAL1 i;as }7iKtl . RS(x3 -CYC 1 -lacZ, available from 
Clontech, Palo Alto, CA. HF7c contains HIS3 and lacZ reporter genes driven by GAL4 
binding sites. 

5 YRG-2: MATa, ura3-52, his3-200, lys2-801, ade2-10], trpl-901, !eu2-3J12, gal4- 

542, gal80-538, LYS2::GAL1 UAS ~GAL1 rAr , r HIS3, URA3:: GA L J ( , AS } 7mcrs(x3} - C YC 1 -lacZ, 
available from Stratagene, La Jolla, CA. YRG-2 contains HIS3 and lacZ reporter genes 
driven by GAL4 binding sites. 

If not already lacking in endogenous reporter gene activity, cells mutant in the 

10 reporter gene may be selected by known methods, or the cells can be made mutant in the 
target reporter gene by known gene-disruption methods prior to introducing the reporter 
gene (Rothstein, 1983, Meth. Enzymol. 101:2(32-211). 

In a specific embodiment, plasmids encoding the different fusion protein populations 
can be introduced simultaneously into a single host cell (e.g., a haploid yeast cell) 

15 containing one or more reporter genes, by co-transformation, to conduct the assay for 
protein-protein interactions. Or, preferably, the two fusion protein populations are 
introduced into a single cell either by mating {e.g., for yeast cells) or cell fusions (e.g., of 
mammalian cells) Tn a mating type assay, conjugation of haploid yeast cells of opposite 
mating type that have been transformed with a binding domain fusion expression construct 

20 (preferably a plasmid) and an activation (or inhibitor) domain fusion expression construct 
(preferably a plasmid), respectively, will deliver both constructs into the same diploid cell. 
The mating type of a yeast strain may be manipulated by transformation with the HO gene 
(Herskowitz and Jensen, 1991, Meth, Enzymol. 194:132-146). 

In a preferred embodiment, a yeast interaction mating assay is employed using two 

25 different types of host cells, strain-type a and alpha of the yeast Saccharomyces cerevisiae. 
The host cell preferably contains at least two reporter genes, each with one or more binding 
sites for the DNA-binding domain (e.g., of a transcriptional activator). The activator 
domain and DNA binding domain are each parts of chimeric proteins formed from the two 
respective populations of proteins. One strain of host cells, for example the a strain, 

30 contains fusions of the library of nucleotide sequences with the DNA-binding domain of a 
transcriptional activator, such as GAL4. The hybrid proteins expressed in this set of host 
cells are capable of recognizing the DNA-binding site in the promoter or enhancer region in 
the reporter gene construct. The second set of yeast host cells, for example, the alpha strain, 
contains nucleotide sequences encoding fusions of a library of DNA sequences fused to the 

35 activation domain of a transcriptional activator. 
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In a preferred embodiment, the fusion protein constructs are introduced into the host 
cell as a set of plasmids. These plasmids are preferably capable of autonomous replication 
in a host yeast cell and preferably can also be propagated in E. coli. The plasmid contains a 
promoter directing the transcription of the DNA binding or activation domain fusion genes, 

5 and a transcriptional termination signal. The plasmid also preferably contains a selectable 
marker gene, permitting selection of cells containing the plasmid. The plasmid can be 
single-copy or multi-copy. Single-copy yeast plasmids that have the yeast centromere may 
also be used to express the activation and DNA binding domain fusions (Elledge et al., 
1988, Gene 70:303-312). 

10 The fusion constructs can be introduced directly into the yeast chromosome via 

homologous recombination mediated through yeast sequences that are not essential for 
vegetative growth of yeast, e.g., the MER2, MER1, ZIP I, RECI02, or MEN gene. 

Bacteriophage vectors can also be used to express the DNA binding domain and/or 
activation domain fusion proteins. Libraries can generally be prepared faster and more 

1 5 easily from bacteriophage vectors than from plasmid vectors. 

Methods can be used for detecting one or more protein-protein interactions 
comprising (a) recombinantly expressing a C elegans insulin-like protein or a derivative or 
analog thereof in a first population of yeast cells being of a first mating type and comprising 
a first fusion protein containing the C elegans insulin-like sequence and a DNA binding 

20 domain, wherein said first population of yeast cells contains a first nucleotide sequence 
operably linked to a promoter driven by one or more DNA binding sites recognized by said 
DNA binding domain such that an interaction of said first fusion protein with a second 
fusion protein, said second fusion protein comprising a transcriptional activation domain, 
results in increased transcription of said first nucleotide sequence; (b) negatively selecting to 

25 eliminate those yeast cells in said first population in which said increased transcription of 
said first nucleotide sequence occurs in the absence of said second fusion protein; (c) 
recombinantly expressing in a second population of yeast cells of a second mating type 
different from said first mating type, a plurality of said second fusion proteins, each second 
fusion protein comprising a sequence of a fragment, derivative or analog of a protein and an 

30 activation domain of a transcriptional activator, in which the activation domain is the same 
in each said second fusion protein; (d) mating said first population of yeast cells with said 
second population of yeast cells to form a third population of diploid yeast cells, wherein 
said third population of diploid yeast cells contains a second nucleotide sequence operably 
linked to a promoter driven by a DNA binding site recognized by said DNA binding domain 

35 such that an interaction of a first fusion protein with a second fusion protein results in 
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increased transcription of said second nucleotide sequence, in which the first and second 
nucleotide sequences can be the same or different; and (e) detecting said increased 
transcription of said first and/or second nucleotide sequence, thereby detecting an 
interaction between a first fusion protein and a second fusion protein. 

5 In a preferred embodiment, the bait C. elegans insulin-like sequence and the prey 

library of chimeric genes are combined by mating the two yeast strains on solid media for a 
period of approximately 6-8 hours. Alternatively, the mating can be performed in liquid 
media. The resulting diploids contain both kinds of chimeric genes, i.e., the DNA-binding 
domain fusion and the activation domain fusion. 

10 Preferred reporter genes include the UJL43. HJS3 and/or the lacZ genes (see e.g., 

Rose and Botstem, 1983, Meth. Enzymol. 101:167-180) operably linked to GAL4 DNA- 
binding domain recognition elements. Other reporter genes comprise the functional coding 
sequences for, but not limited to, Green Fluorescent Protein (GFP) (Cubitt et aL 1995, 
Trends Biochem. Sci. 20:448-455), luciferase, LEU2, LYS2, ADE2 9 TRP1, CANL CYH2, 

15 GUS* CUP1 or chloramphenicol acetyl transferase (CAT). Expression of LEU 2, LYS2, 
ADE2 and TRP1 are detected by growth in a specific defined media; G£/Sand CAT can be 
monitored by well known enzyme assays; and CAN J and CYH2 are detected by selection in 
the presence of canavanine and cycloheximide. With respect to GFP, the natural 
fluorescence of the protein is detected, or a modified GFP having modified fluorescence is 

20 detected. 

Transcription of the reporter gene can be detected by a linked replication assay. For 
example, as described by Vasavada et aL, 1991, Proc. Natl. Acad. Sci. U.S.A. 88: 10686- 
10690, expression of SV40 large T antigen is under the control of the El B promoter 
responsive to GAL4 binding sites. The replication of a plasrnid containing the SV40 origin 

25 of replication, indicates the reconstruction of the GAL4 protein and a protein-protein 
interaction. Alternatively, a polyoma virus replicon can be employed (Vasavada et ah, 
1991, Proc. Natl. Acad. Sci. U.S.A. 88:10686-10690). 

The expression of reporter genes that encode proteins can also be detected using 
immunoassay methods. Alam and Cook (1990, Anal. Biochem. 188:245-254) disclose 

30 examples of detectable marker genes that can be operably linked to a transcriptional 
regulatory region responsive to a reconstituted transcriptional activator, and thus used as 
reporter genes. 

The activation of reporter genes like URA3 or HIS3 enables the cells to grow in the 
absence of uracil or histidine, respectively, and hence serves as a selectable marker. Thus, 
35 after mating, the cells exhibiting protein-protein interactions are selected by the ability to 
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grow in media lacking a nutritional component, such as uracil or histidine (referred to as - 
URA (minus URA) and -HIS (minus HIS) medium, respectively). The -HIS medium 
preferably contains 3-amino-l,2,4-triazole (3-AT). which is a competitive inhibitor of the 
HIS3 gene product, and thus, requires higher levels of transcription in the selection {see 
5 Durfee et al., 1993, Genes Dev. 7:555-569). Similarly, 6-azauracil, which is an inhibitor of 
the URA3 gene product, can be included in -URA medium (Le Douarin et al., 1995, Nucl. 
Acids Res. 23:876-878). URA3 gene activity can also be detected and/or measured by 
determining the activity of its gene product, orotidine-5'-monophosphate decarboxylase 
(Pierrat et ah, 1992, Gene 1 19:237-245; Wolcott et al., 1966, Biochem. Biophys. Acta 
10 122:532-534). In other embodiments of the present invention, the activities of the reporter 
genes like GFP or lacZ are monitored by measuring a detectable signal {e.g., fluorescent or 
chromogenic, respectively) that results from the activation of these reporter genes. For 
example, lacZ transcription can be monitored by incubation in the presence of a 
chromogenic substrate, such as X-gal (5-bromo-4-chloro-3-indolyl-P-D-galactoside), of its 
1 5 encoded enzyme, P-galactosidase. The pool of all interacting proteins isolated by this 
manner from mating the C. elegans insulin-like sequence product and the library identifies 
the "insulin-like interactive population". 

False positives arising from transcriptional activation by the DNA binding domain 
fusion proteins in the absence of a transcriptional activator domain fusion protein can be 
20 prevented or reduced by negative selection for such activation within a host cell containing 
the DNA binding fusion population, prior to exposure to the activation domain fusion 
population. For example, if such cell contains URA 3 as a reporter gene, negative selection 
is carried out by incubating the cell in the presence of 5-fluoroorotic acid (5-FOA), which 
kills. H ence, if the DNA-binding domain fusions by themselves activate transcription, the 
25 metabolism of 5-FOA will lead to cell death and the removal of self-activating DNA- 
binding domain hybrids. 

Negative selection involving the use of a selectable marker as a reporter gene and the 
presence in the cell medium of an agent toxic or growth inhibitory to the host cells in the 
absence of reporter gene transcription is preferred, since it allows a higher rate of processing 
30 than other methods. Negative selection can also be carried out on the activation domain 
fusion population prior to interaction with the DNA binding domain fusion population, by 
similar methods, either alone or in addition to negative selection of the DNA binding fusion 
population. 

Negative selection can also be carried out on the recovered protein-protein complex 
35 by known methods (see e.g., Bartel et al., 1993, BioTechniques 14:920-924) although pre- 
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negative selection (prior to the interaction assay) is preferred. For example, each plasmid 
encoding a protein (peptide or polypeptide) fused to the activation domain (one-half of a 
detected interacting complex) can be transformed back into the original screening strain, 
either alone or with a plasmid encoding only the DNA-binding domain, the DNA-binding 

5 domain fused to the detected interacting protein, or the DNA-binding domain fused to a 
protein that does not affect transcription or participate in the protein-protein interaction. A 
positive interaction detected with any plasmid other than that encoding the DNA-binding 
domain fusion to the detected interacting protein is deemed a false positive and is 
eliminated from the screen. 

10 In a preferred embodiment, the C elegans insulin-like plasmid population is 

transformed in a yeast strain of a first mating type (a or alpha), and the second plasmid 
population (containing the library of DNA sequences) is transformed in a yeast strain of a 
different mating type. Both strains are preferably mutant for URA3 and HIS3, and contain 
HIS3, and optionally lacZ, as reporter genes. The first set of yeast cells are positively 

15 selected for the insulin-like plasmids and are negatively selected for false positives by 
incubation in medium lacking the selectable marker (e.g., tryptophan) and containing 5- 
FOA. Yeast cells of the second mating type are transformed with the second plasmid 
population, and are positively selected for the presence of the plasmids containing the 
library of fusion proteins. Selected cells are pooled. Both groups of pooled cells are mixed 

20 together and mating is allowed to occur on a solid phase. The resulting diploid cells are 
then transferred to selective media that selects for the presence of each plasmid and for 
activation of reporter genes. 

After an interactive population is obtained, the DN A sequences encoding the pairs of 
interactive proteins can be isolated by a method wherein either the DNA-binding domain 

25 hybrids or the activation domain hybrids are amplified, in separate respective reactions. 
Preferably, the amplification is carried out by polymerase chain reaction (PGR) using pairs 
of oligonucleotide primers specific for either the DNA-binding domain hybrids or the 
activation domain hybrids. This PCR reaction can also be performed on pooled cells 
expressing interacting protein complexes, preferably pooled arrays of interactants. Other 

30 amplification methods known in the art can be used, such as ligase chain reaction, use of QP 
replicase, or methods listed in Kricka et ah, 1995, Molecular Probing, Blotting, and 
Sequencing, Academic Press, New York, Chapter 1 and Table IX. 

The plasmids encoding the DNA-binding domain hybrid and the activation domain 
hybrid proteins can also be isolated and cloned by any known method. For example, if a 

35 shuttle (yeast to E. coli) vector is used to express the fusion proteins, the genes can be 
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recovered by transforming the yeast DNA into E. coli and recovering the plasmids from E. 
coli. Alternatively, the yeast vector can be isolated, and the insert encoding the fusion 
protein subcloned into a bacterial expression vector, for growth of the plasmid in E. coli, 

5 Assays of insulin-like proteins 

The functional activity of insulin-like proteins, derivatives and analogs can be 
assayed using known methods. For example, immunoassays can be used to test the ability 
to bind to an anti-insulin-like protein antibody, or to compete for binding with a wild-type 
insulin-like protein. Various competitive and non-competitive assay systems can be used 

10 such as radioimmunoassays, ELISA, immunoradiometric assays, gel diffusion precipitin 
reactions, immunodiffusion assays, in situ immunoassays {e.g., using colloidal gold, enzyme 
or radioisotope labels), western blots, precipitation reactions, agglutination assays {e.g., gel 
agglutination assays, hemagglutination assays), complement fixation assays, 
immunofluorescence assays, protein A assays, and immunoelectrophoresis assays, etc. 

15 Physiological correlates of insulin-like protein binding to its substrates and/or receptors 
{e.g., signal transduction) can be assayed. 

In insect {e.g., D. melcmogaster), worm {e.g., C. elegans), or other model systems, 
genetic studies can be done to study the nhenotynic effect of an insulin-like gene mutant that 
is a derivative or analog of a wild-type insulin-like gene as described further below. 

20 

Antisense regulation of gene expression 

The invention provides for antisense sequences of C. elegans insulin-like genes. An 
insulin-like "antisense" nucleic acid as used herein refers to a nucleic acid capable of 
hybridizing to a portion of an insulin-like RNA (preferably mRNA) by virtue of some 

25 sequence complementarity. Antisense nucleic acids may also be referred to as inverse 
complement nucleic acids. The antisense nucleic acid may be complementary to at least a 
portion of a coding and/or noncoding region of an insulin-like mRNA. Absolute 
complementarity is not required, but should be sufficient so that a stable duplex with the 
RNA can form. In the case of double-stranded insulin-like antisense nucleic acids, a single 

30 strand of the duplex DNA may thus be tested, or triplex formation may be assayed. The 
ability to hybridize will depend on both the degree of complementarity and the length of the 
antisense nucleic acid. Generally, the longer the hybridizing nucleic acid, the more base 
mismatches with an insulin-like RNA it may contain and still form a stable duplex (or 
triplex, as the case may be). The degree of tolerable mismatch can be readily determined by 

35 calculating the melting point of the hybridized complex. 
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Antisense nucleic acids have utility in inhibiting an insulin-like protein function. 
For example, such antisense nucleic acids may be useful as pesticides to eradicate parasites 
in plants, or in animals such as dogs. A preferred antisense nucleic acid is a single stranded 
DNA oligonucleotide comprising a sequence antisense to the sequence encoding a B peptide 

5 domain or an A peptide domain of an insulin-like protein. 

Preferably the antisense nucleic acids are oligonucleotides having at least 6 
nucleotides and more preferably at least 10, 15, 20, or 50 nucleotides. Oligonucleotides 
having at least 100 or 200 nucleotides can also be used. The oligonucleotides can be double 
or single stranded RNA or DNA or chimeric mixtures or derivatives or modified versions 

1 0 thereof. One or more modifications can be made at the base or sugar moiety, or phosphate 
backbone. Examples of modified base moieties include 5-fluorouracil, 5-bromouracil, 
5-chIorouraciI, 5-iodouracil, hypoxanthine. xanthine, 4-acetylcytosine, 
5-(carboxyhydroxyImethyl) uracil, 5-carboxymethyiaminomethyl-2-thiouridine, 
5-carboxymethylaminomethyluracil, dihydrouracil, beta-D-galactosylqueosine, inosine, 

15 N6-isopentenyladenine, 1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 
7-methylguanine, 5-methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta- 
D-mannosylqueosine, 5'-methoxycarboxymcthyluradI_ 5-methoxyuraciL 2-methylthio~N6- 
isopentenyladenine, uracil-5-oxyacetic acid (v), wybutoxosine, pseudouracil, queosine, 

20 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil, uracil- 

5-oxyacetic acid methylester, uracil-5-oxyacetic acid (v), 5-methyl-2-thiouracil, 3-(3-amino- 

3- N-2-carboxypropyl) uracil, (acp3)w, and 2,6-diaminopurine. Examples of modified sugar 
moieties include arabinose, 2-fluoroarabinose, xylulose, and hexose. Examples of 
modifications at the phosphate backbone a phosphorothioate. a phosphorodithioate, a 

25 phosphoramidothioate, a phosphoramidate, a phosphordiamidate, a methylphosphonate, an 
alkyl phosphotriester, and a formacetal or analog thereof 

The oligonucleotide may include other appending groups such as peptides, agents 
that facilitate transport across the cell membrane or blood-brain barrier, hybridization- 
triggered cleavage agents or intercalating agents. 

30 The oligonucleotide can also be a-anomeric so that it forms specific double-stranded 

hybrids with complementary RNA in which, contrary to the usual P-units, the strands run 
parallel to each other. 

The oligonucleotide may be conjugated to another molecule, e.g., a peptide, a 
hybridization-triggered cross-linking agent, a transport agent, a hybridization-triggered 

35 cleavage agent, etc. 
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An insulin-like antisense oligonucleotide may comprises catalytic RNA, or a 
ribozyme (see e.g. WO 90/1 1364; Sarver et aL 1 990, Science 247:1222-1225). Inanother 
embodiment, the oligonucleotide is a 2'-0-methyIribonucleotide (Inoue et al., 1987, Nucl. 
Acids Res. 15:6131-6148), or a chimeric RNA-DNA analogue (Inoue et al., 1987, FEBS 

5 Lett. 215:327-330). 

The oligonucleotides may be synthesized by known methods, e.g. , by use of an 
automated DNA synthesizer (commercially available from Biosearch, Applied Biosystems, 
etc.). Phosphorothioate oligonucleotides may be synthesized by the method of Stein et al. 
(1988, Nucl. Acids Res. 16:3209), methylphosphonate oligonucleotides can be prepared by 

10 use of controlled pore glass polymer supports (Sarin et al., 1988, Proc. Natl. Acad. Sci. 
U.S.A. 85:7448-7451), etc. Alternatively, the insulin-like antisense nucleic acids can be 
produced intracellularly by transcription from an exogenous sequence. For example, a 
vector can be introduced in vivo such that it is taken up by a cell, within which cell the 
vector or a portion thereof is transcribed, producing an antisense nucleic acid (RNA). Such 

15 a vector would contain a sequence encoding the insulin-like antisense nucleic acid. The 
vector can remain episomal or become chromosomal ly integrated, as long as it can be 
transcribed to produce the desired antisense RNA. Such vectors can be constructed by 
recombinant DNA technology methods standard in the art. Vectors cm be plasm id, viral, or 
others known in the art, used for replication and expression in mammalian cells. Expression 

20 of the sequence encoding the insulin-like antisense RNA can be by any promoter, inducible 
or constitutive, known to act in mammalian cells, such as those previously discussed. 

Identifying signaling pathways and phenotypes 

Animal models which may be used in the identification and characterization of C. 

25 elegans insulin-like protein signaling pathways, and/or phenotypes associated with the 

mutation or abnormal expression of a C. elegans insulin-like protein. Methods of producing 
a variety of animal models using novel genes and proteins are well known (see e.g., WO 
96/34099); three examples are discussed below. 

In one type of animal model a normal C. elegans insulin-like gene has been 

30 recombinantly introduced into the genome of the animal as an additional gene, under the 
regulation of either an exogenous or an endogenous promoter element, and as either a 
minigene or a large genomic fragment. The normal gene can be recombinantly substituted 
(e.g. by homologous recombination or gene targeting) for one or both copies of the animal's 
homologous gene. 

35 
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In a second model animal, a mutant C. elegans insulin-like gene has been 
recombinantly introduced into the genome of the animal as an additional gene, under the 
regulation of either an exogenous or an endogenous promoter element, and as either a 
minigene or a large genomic fragment. The mutant gene can be recombinantly substituted 

5 for one or both copies of the animal's homologous gene. 

Third, animals are provided in which a mutant version of one of that animal's own 
genes (bearing, for example, a specific mutation corresponding to, or similar to, a 
pathogenic mutation of an insulin-like gene from another species) has been recombinantly 
introduced into the genome of the animal as an additional gene, under the regulation of 

10 either an exogenous or an endogenous promoter element, and as either a minigene or a large 
genomic fragment. 

Finally, equivalents of transgenic animals, including animals with mutated or 
inactivated genes, may be produced using chemical or x-ray mutagenesis. Using the 
isolated nucleic acids disclosed herein one may more rapidly screen the resulting offspring 

15 by, for example, direct sequencing, restriction fragment length polymorphism (RFLP) 
analysis, PCR, or hybridization analysis to detect mutants, or Southern blotting to 
demonstrate loss of one allele. 

Such animal models may be used to identify phenotypes associated with mutation or 
abnormal expression of a C. elegans insulin-like protein and to identify a C elegans insulin- 

20 like protein signaling pathway. For example, a C. elegans insulin-like gene can be disrupted 
(e.g mutated or abnormally expressed) and the effect can be identified using any suitable 
assay commonly used in C elegans research (e.g. a dauer formation assay, a developmental 
assay, an energy metabolism assay, a growth rate assay and a reproductive capacity assay). 
The gene can be disrupted by any suitable method such as EMS chemical deletion 

25 mutagenesis, transposon insertion mutagenesis, or double-stranded RN A interference, as 
discussed in detail below. 

Abnormal expression can be overexpression, underexpression (e.g., due to 
inactivation), expression at a developmental time different from wild-type animals, or 
expression in a cell type different from in wild-type animals. 

30 

Assays for changes in gene expression 

Changes in the expression of identified C elegans insulin-like genes and proteins 
can be detected using known (see e.g., WO 96/34099). Such assays may be performed in 
vitro using transformed cell lines, immortalized cell lines, or recombinant cell lines, or in 
35 vivo using animal models. The assays may detect the presence of increased or decreased 
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expression of a C. elegans insulin-like gene or protein on the basis of increased or decreased 
mRNA expression (using, e.g., nucleic acid probes), increased or decreased levels of related 
protein products (using, e.g., the antibodies disclosed herein), or increased or decreased 
levels of expression of a marker gene (e.g., |3-galactosidase or luciferase) operably linked to 

5 a 5' regulatory region in a recombinant construct. 

Various expression analysis techniques may be used to identify genes which are 
differentially expressed between two conditions, such as a cell line or animal expressing a 
normal C. elegans insulin-like gene compared to another cell line or animal expressing a 
mutant C elegans insulin-like gene. Such techniques include differential display, serial 

10 analysis of gene expression (SAGE), nucleic acid array technology, subtractive 

hybridization, proteome analysis and mass-spectrometry of two-dimensional protein gels. 
Nucleic acid array technology (i.e., gene chips) may be used to determine a global (i.e., 
genome-wide) gene expression pattern in a normal C. elegans animal for comparison with 
an animal having a mutation in one or more C elegans insulin-like genes. 

1 5 Gene expression profiling can be used to identify other genes (or proteins) that may 

have a functional relation to (e.g., may participate in a signaling pathway with) a C. elegans 
insulin-like gene. The genes are identified by detecting changes in their expression levels 
following mutation, i.e., insertion, deletion or substitution in : or overexpression, 
underexpression, mis-expression or knock-out, of a C elegans insulin-like gene, as 

20 described in the examples below. Expression profiling methods provide a powerful 

approach for analyzing the effects of mutation in a C. elegans insulin-like gene. A variety 
of methods are well known in the art including subtractive hybridization, differential 
display, serial analysis of gene expression (SAGE), proteome analysis, and hybridization- 
based methods employing nucleic acid arrays. 

25 

Identification of compounds with binding capacity 

Screening methodologies can be used for the identification of proteins and other 
compounds which bind to, or otherwise directly interact with, the C. elegans insulin-like 
genes and proteins of the invention. Suitable screening methods are disclosed in WO 

30 96/34099. The proteins and compounds include endogenous cellular components which 
interact with the identified genes and proteins in vivo and which, therefore, may provide 
new targets for pharmaceutical and therapeutic interventions, as well as recombinant, 
synthetic, and otherwise exogenous compounds which may have binding capacity and, 
therefore, may be candidates for pharmaceutical agents. Thus, cell lysates or tissue 

35 homogenates may be screened for proteins or other compounds which bind to one of the 
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normal or mutant C. elegans insulin-like genes and proteins. Alternatively, any of a variety 
of exogenous compounds, both naturally occurring and/or synthetic {e.g., libraries of small 
molecules or peptides), may be screened for binding capacity. Typically, a screening 
method comprises the step of mixing a C elegans insuiin-like protein or fragment or 
5 derivative thereof with test compounds, allowing time for any binding to occur, and 
assaying for any bound complexes. 

EXAMPLES 

The following examples are provided merely as illustrative of various aspects of the 
1 0 invention and shall not be construed to limit the invention in any way. The Examples 
describe the discovery of an unexpectedly large family of insulin-like genes in C elegans 
which includes the 31 genes as illustrated in the alignment of FIG. 3 and in FIGs 4-36 and 
described in detail below. The SEQ ID NO for each protein and cDNA corresponding to 
these insulin-like genes is set forth in Table 1 below. 

15 

Table 1 . C elegans insulin-like genes and the corresponding sequence identification 
number (SEQ ID NO:) for each encoded protein and cDNA. See FIG. 4 through 
FIG. 34 for annotated sequences. 
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EXAMPLE 1: PCR CLONING OF C ELEGANS INSULIN-LIKE cDNAs 

Twenty-two C elegans insulin-like genes have been cloned using the polymerase 
chain reaction (PCR), as described in detail below. See Table 1 for the assigned name of 
each of the eighteen C elegans insulin-like genes, and the corresponding sequence 
identification number for the nucleotide sequence of each cDNA and the amino acid 
sequence of each protein. 

PCR primers were designed for cloning each gene under the following general 
rationale. For further details specific for each gene, see the Examples section below. 

Genes ZK75.3, ZK75.L ZK 125 1.2 and ZK 125 I.N were all predicted to have an SL1 
splice acceptor upstream of the predicted start codon. Therefore, the SL1 sequence was 
used as the upstream primer for each of these cDNAs. ZK84.6 was predicted to have a 
splice acceptor upstream of the start codon; however, no PCR product was obtained using 
SL1 as an upstream primer. Therefore, the sequence immediately following the predicted 
splice acceptor was used. The downstream primers were chosen to fall downstream of the 
predicted stop codon. 

For M04D8.1, M04D8.2, M04D8.3, C17C3.4, C17C3.N, F13B12.N, T28B8.N, 
ZC334.N, and ZK84.N, primers had a Hindlll site on the end of the 5' primer according to 
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the formula CCC-AAGCTT-N, where N = 24 to 26 specific nucleotides; and an Xbal site on 
the end of the 3' primer according to the formula GC-TCTAGA-N, where N = 24 to 26 
specific nucleotides. The engineered restriction sites of these primers were used for cloning. 
F56F3.6 has an internal Xbal site, so an Xhol site was used instead on the 3 T primer. What 
5 follows is a list of conditions used for PCR amplification and cloning of each gene. 

ZK75.1 

The template DNA source was a mixed-stage, C. elegans cDNA library, oligo-dT 
primed and ligated into UniZap XR (phage lambda) vector available from Stratagene. The 
10 library DNA was prepared by Qiagen purification and adjusted to a concentration of 70 
ng/^1. 

The cDNA was generated by the polymerase chain reaction (PCR) procedure, using 
the Boehringer Mannheim Expand High Fidelity PCR System. Each reaction was 
performed in a total volume of 100 fil. The components of the reaction were 1 (70 ng) 
15 template DNA, 200 \iM each dNTP, 300 nM each primer as described below, IX buffer 
with MgCl 2 as supplied by the manufacturer, and 2.6 U of enzyme. 

First, the primers were pooled and denatured at 95 °C for 5:00 (where 0:00 indicates 
time in rninutes:seconds). and stored on ice. The remainder of the reaction mixture was 
added, and the PCR reaction started as follows: 
20 95°C for 2:00 

35 cycles of: 95 °C for 0:15 
54°C for 0:30 
72°C for 1:00 

72° for 5:00 

25 For the first round of PCR, the primers used were as follows: 

75.1 GACGGAGATGGCTTGTTGGACGAC (SEQ ID NO:37) 

SL1 GGTTTAATTACCCAAGTTTGAG (SEQ ID NO:38) 

The first round of PCR yielded no detectable band as determined by agarose gel 
electrophoresis, staining with ethidium bromide, and visualization on a long-wave UV light 

30 box. Accordingly, a second round of PCR was next performed as described above, except 
with the following changes. The template DNA was 1 \i\ of the first round PCR reaction, 
the reactions were run for 20 cycles only, and different (nested ) primers were used as 
follows: 

75.1.5' CAAGAGAATGTTTTCATTCTTTAC (SEQIDNO:39) 
35 75. IB TTACTTTTCTGGGCAGCAAGCTTG (SEQ ID NO:40) 
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The second PGR reaction yielded a strong single band of DNA at the predicted size. 
To subclone this PCR product into a plasmid vector for DNA sequencing, we first isolated 
the PCR product by agarose gel electrophoresis (90 jal of the second PCR reaction run on a 
1 .2% gel). We excised the band with a razor blade and purified the product from the gel 

5 using the Prep-a-Gene kit from BioRad. We then ligated the PCR product into the plasmid 
vector PCRJI and transformed E. coli using an InVitrogen TA Cloning Kit. We screened 
bacterial colonies for the correct plasmid by preparing mini-prep DNA using the Primm 
Labs Mini-Prep kit, and analyzed the mini-prep DNA by EcoRl restriction digest and 
agarose gel electrophoresis. 

1 0 We sequenced the subcloned PCR products by thermal cycling, using the Big Dye 

ready reaction mix sequencing kit. For each sequencing reaction, we added: approximately 
100 ng of mini-prep DNA; 0.8 pmol of sequencing primer; 1.5 \i\ 5X Big Dye ready 
reaction buffer; 1 jal 80 mM Tris. 2 mM MgCl 2 pH 9.0; and adjusted the volume to 10 ^1 
with distilled water. The Ml 3 Forward and Ml 3 Reverse sequencing primers were used. 

1 5 The sequencing reactions were thermal cycled using the following program: 
96° for 5:00 

25 cycles of: 96°C for 0:30 

50 C C for 0:1560°C for 4:00 
We precipitated the cycled DNA with 75 |il 70% ethanol/5 mM MgCl 2 by incubating 
20 at room temperature for 20 minutes. We recovered the precipitated DNA by centrifugation 
at 1 5,000 X g for 30 minutes, removed the supernatant, and further dried the DNA pellet by 
vacuum centrifugation for 10 minutes. The sequencing reactions were analyzed and the 
DNA sequence determined by gel electrophoresis and fluorescent detection of sequencing 
products. 

25 

ZK75.2 

The template DNA source was mixed-stage C. elegans first strand cDNA, poly-A 
selected and oligo-dT primed using the Gibco-BRL Superscript kit. The RNA was removed 
by RNAse digestion, and the cDNA was diluted with TE buffer and adjusted to a final 
30 concentration of approximately 70 ng/|nl. The cDNA was generated by the polymerase 
chain reaction (PCR) procedure, using the Boehringer Mannheim Expand High Fidelity 
PCR System. Each reaction was performed in a total volume of 100 |il. The components of 
the reaction were 1 |il (70 ng) template DNA, 200 jiM each dNTP, 300 nM each primer as 
described below, IX buffer with MgCU as supplied by the manufacturer, and 2.6 U enzyme. 

35 
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First, the template was denatured at 95 °C for 5:00 minutes and stored on ice. The 
remainder of the reaction mixture was added, and the PCR reaction started as follows: 
95°C for 2:00 
35 cycles of: 95 °C for 0:15 
5 54°C for 0:30 

72°Cfor 1:00 

72° for 5:00 

For the first round of PCR, the primers were as follows: 
75.2.5' CTACCATGAACGCTATAATCTTCT (SEQIDNO:41) 
10 75.2.3' ATGATAGTACGATATGTCCATAAC (SEQ ID NO:42) 

This reaction yielded a single strong band of the expected size (349 bp) after one 
round of PCR. 

To subclone the PCR product into a plasmid vector for DNA sequencing, we first 
isolated the PCR product by agarose gel electrophoresis (90 |al of the second PCR reaction 
15 run on a 1 .2% gel). We excised the band with a razor blade, and purified the product from 
the gel using the Prep-a-Gene kit from BioRad. We then ligated the PCR product into the 
plasmid vector PCRII and transformed E, coli using the InVitrogen TA Cloning Kit. We 
screened bacteria! colonies for the correct plasmid by colony PCR, using the following 
primers: 

20 75.2.5' CTACCATGAACGCTATAATCTTCT (SEQIDNO:41) 

75.2.3' ATGATAGTACGATATGTCCATAAC (SEQIDNO:42) 

To confirm the positive colonies, we prepared mini-prep plasmid DNA from 

positive colonies using the Primm Labs miniprep kit and confirmed the plasmid by EcoRl 

restriction digest and agarose gel electrophoresis. 
25 We analyzed the sequence of the PCR product as described for ZK75. 1 . 

ZK75.3 

A first round PCR reaction was performed exactly as for ZK75.2, except using 
primers: 

30 75.3 CCTATTTTCCAGCCACAGCACTCTC (SEQ ID NO:43) 

SL1 GGTTTAATTACCCAAGTTTGAG (SEQ ID NO:38) 

No band was obtained after the first round of PCR. Strong bands of 426 bp were 
obtained after the second round of PCR, which was performed as follows: 
template = 2 \xl of first round PCR 
35 same primers as first round 
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same PCR conditions as first round 
Subcloning and sequencing of the second round reaction product was performed 
exactly as for ZK75. 1 . 

5 ZK84.6 

First round PCR was performed exactly as for ZK75.L except using primers: 
84.30UTER CCCCGTACTCATTTTCCGTTATCC (SEQIDNO:44) 
84.3 GTATGGTACAGAGACTGATATCGG (SEQ ID NO:45) 

A strong single band of 423 bp after the first round of PCR was obtained. 
10 Subcloning and sequencing of PCR products was performed exactly as for ZK75.2, except 
using the following primers for colony PCR screening: 
84.30UTER CCCCGTACTCATTTTCCGTTATCC (SEQ ID NO:44) 
84.3. 5'B CAAGGAAAATGCACTCGATCGTCG (SEQ ID NO:46) 

15 ZK84.N 

The template DNA source was a mixed stage C. elegam cDNA library oligo primed 
and ligated into UniZap XR (phage lambda) vector, purchased from Stratagene. The library 
DNA was prepared by Qiagen purification and adjusted to a concentration of 70 ng/uL 

The cDNA was generated by the polymerase chain reaction (PCR) procedure, using 
20 the Boehringer Mannheim Expand High Fidelity PCR System. Each reaction was 

performed in a total volume of 50 |al. The components of the reaction were 0.5 \x\ (70 ng) 
template DNA, 100 \iM each dNTP, 150 nM each primer as described below, IX buffer 
with MgCK as supplied by the manufacturer, and 1 .3 U enzyme. 

First, the template was denatured at 95 °C for 5:00 minutes, and stored on ice. The 
25 remainder of the reaction mixture was added, and the PCR reaction started as follows: 
95°C for 2:00 
35 cycles of: 95 °C for 0:15 
54°C for 0:30 
72°C for 1:00 

30 For the first round of PCR, the primers were: 

84.NF-HIN CCCAAGCTTTGTTATTTAATGATGTGGAGATGG (SEQIDNO:47) 
84.NR-XBA GCTCTAGAATGGTAAATACAGAACATTGGTTC (SEQIDNO:48) 

This reaction yielded a strong single band of DNA at the predicted size. To 
subclone the PCR product into a plasmid vector for DNA sequencing, we first purified the 

35 PCR product with the Geneclean kit (Bio 101), then digested the product with Hindlll and 
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Xbal and isolated the PCR product by agarose gel electrophoresis (45 \il of the PCR 
reaction run on a 1 .2% gel). We excised the band with a razor blade, and purified the 
product from the gel using the Geneclean kit. We then ligated the cut PCR product into the 
plasmid vector pcDNA3.1 (InVitrogen) cut with Hindlll/Xbal and transformed E. coli. We 
5 screened bacterial colonies for the correct plasmid by preparing mini-prep DNA using the 
Primrn Labs Mini-Prep kit, and analyzed the mini-prep DNA by Pmel restriction digest and 
agarose gel electrophoresis. 

We sequenced the subcloned PCR products by thermal cycling, using the Big Dye 
ready reaction mix sequencing kit. For each sequencing reaction, we added approximately 
10 100 ng of mini-prep DNA; 0.8 pmol of sequencing primer; 1 |al 5X Big Dye ready reaction 
buffer; 1 .5 fil 80 mM Ins, 2 mM MgCl 2? pH 9.0; and adjusted the volume to 10 fil with 
distilled water. The sequencing primers used were pcDNA3.1BGHReverse and a T7 
promoter primer. The sequencing reactions were thermal cycled using the following 
program: 
15 96° for 5:00 

25 cycles of: 96 U C for 0:30 
50°C for 0:15 
60 C C for 4:00 

We precipitated the cycled DNA with 75 |il 70% ethanol/5 mM MgCl, by incubating 
20 at room temperature for 20 minutes. We recovered the precipitated DNA by centrifugation 
at 1 5,000 X g for 30 minutes, removed the supernatant, and further dried the DNA pellet by 
vacuum centrifugation for 1 0 minutes. The sequencing reactions were analyzed and the 
DNA sequence determined by gel electrophoresis and fluorescent detection of sequencing 
products. 

25 

ZK84.N2 

PCR was performed exactly as for ZK84.N, except using PCR primers: 
ORPR-XBA GCTCTAGAGTGACGGTAGGTGTGTAGATGAAC (SEQ ID NO:49) 
84.35' ATCGAAACTCTTCAATCTTCAAGG (SEQIDNO:50) 

30 This reaction yielded a strong single band of DNA at the predicted size. To 

subclone the PCR product into a plasmid vector for DNA sequencing, we first isolated the 
PCR product by agarose gel electrophoresis (45 jal of the PCR reaction run on a 1 .2% gel). 
We excised the band with a razor blade, and purified the product from the gel using the 
Geneclean kit. We then ligated the PCR product into the plasmid vector PCRII and 

35 transformed E. coli using the InVitrogen TA Cloning Kit. We screened bacterial colonies 
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for the correct plasmid by preparing mini-prep DNA using the Primm Labs MiniPrep kit. 
and analyzed the mini-prep DNA by Pmel restriction digest and agarose gel electrophoresis. 

We sequenced the subcloned PCR products by thermal cycling, using the Big Dye 
ready reaction mix sequencing kit. For each sequencing reaction, we added approximately 
5 100 ng of mini-prep DNA; 0.8 pmol of sequencing primer; 1 fil 5X Big Dye ready reaction 
buffer; 1 .5 pi 80mM Tris, 2 mM MgCU, pH 9.0; and adjusted the volume to 10 |il with 
distilled water. The sequencing primers used were pcDNA3.1BGHReverse and a T7 
promoter primer. The sequencing reactions were thermal cycled using the following 
program: 
10 96° for 5:00 

25 cycles of: 96°C for 0:30 
50°C for 0:15 
60°C for 4:00 

We precipitated the cycled DNA with 75 jil 70% ethanol/5 mM MgCl 2 by incubating 
1 5 at room temperature for 20 minutes. We recovered the precipitated DNA by centrifugation 
at 15.000 x g for 30 minutes, removed the supernatant, and further dried the DNA pellet by 
vacuum centrifugation for 10 minutes. The sequencing reactions were analyzed and the 
DNA sequence determined by ge! electrophoresis and fluorescent detection of sequencing 
products. 

20 

ZK125L2 

PCR was performed exactly as for ZK75.1, except using primers: 
SL1 GGTTTAATTACCCAAGTTTGAG (SEQ ID NO:38) 

1251.2 GATAGAAGAAATTAAGGACAGCAC (SEQ ID NO:51) 
25 A single strong band of 35 1 bp was obtained after one round of PCR. Subcloning 

and sequencing of PCR products was performed exactly as for ZK75.1. 

ZK1251.N 

PCR was performed exactly as for ZK75.1, except using primers: 
30 1251.N GTAAACGATTAGATTAAGGACAAC (SEQ ID NO:52) 
SL1 GGTTTAATTACCCAAGTTTGAG (SEQIDNO:38) 

No band was obtained after the first round of PCR. A second round was performed 
using an aliquot of the first round reaction as template, the same reaction mix and primers, 
and the same PCR conditions. Strong bands of 349 bp were obtained after the second round 
35 of PCR. Subcloning and sequencing was performed exactly as for ZK75.1. 
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C06E2.N 

PCR was performed exactly as for ZK75.1, except using primers: 
C06E2.5' GAGGAGTGAAACGATGATCGTCAC (SEQ ID NO:53) 
C06E2 ATCCAATTGAGAAGACGATTGTTG (SEQ ID NO:54) 
5 No band was obtained after the first round of PCR. A second round of PCR was 

performed using an aliquot of the first round as template, the same reaction mix and 
primers, and the same PCR conditions as in the first round, but for 20 cycles rather than 35 
cycles. 

A single strong band of 404 bp was obtained after the second round of PCR. 
10 Subcloning and sequencing of PCR products was performed exactly as ZK75.1. 

M04D8.1 

PCR was performed exactly as for ZK84.N, except using primers: 
8.1F-HIN CCCAAGCTTTTGAACCATGAAAACCTACTCATT (SEQIDNO:55) 
15 8.IR-XBA GCTCTAGAGCTTTTTTTTATTCGGGACAGCAA (SEQIDNO:56) 

M04D8.3 

PCR. was performed exactly as for ZK84.N. except using primers: 
8.3F-HIN CCCAAGCTTGGATTTCTGGAATTTCGATAATG (SEQ ID NO:57) 
20 8.3R-XBA GCTCTAGAGCAGCATAGAATGGCGGAAGATC (SEQ ID NO:58.) 

C17C3.4 

PCR was performed exactly as for ZK84.N, except using primers: 
3.4F-HIN CCCAAGCTTGTGTAGGAATCGTTAAATATGTCT (SEQ ID NO:59) 
25 3.4R-XBA GCTCTAGAGAGATCA TATTATATTACACGAAC <SEQIDNO:60) 

F13B12.N 

PCR was performed exactly as for ZK84.N, except using primers: 
B12F-HIN CCCAAGCrrCCGCTCTCAACAACGGGCCACACG (SEQ ID NO:6D 
30 B12R-XBA GCTCTAGAGATGAATAAGTTATCAATTATCGT (SEQ ID NO:62) 
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25 



T28B8.N 

PCR was performed exactly as for ZK84.N, except using primers: 
SL1 -HIN CCCAAGCTTGGTTTAATTACCCAAGTTTGAG (SEQ ID NO:63) 
B8.2R-XBA GCTCTAGATGATGCGTATTTTGTGGGCGGTAC (SEQ ID NO:64) 

ZC334.N 

PCR was performed exactly as for ZK84.N. except using primers: 
SL1-HIN CCCAAGCTTGGTTTAATTACCCAAGTTTGAG (SEQ ID NO:63) 
34.MR-XBA GCTCTAGACTCATCAGTTGAAAATGAATTTAAG (SEQ ID NO:65) 

F36F3.6 

PCR was performed exactly as for ZK84.N, except using primers: 
F3.6F-HIN CCCAAGCTTGGCATAAGCGAGTATCTGTGATCC (SEQ IDNO:66) 
F3.6R-XHO CCGCTCGAGGTAAAGCGAGGGTAAAGTAGATCG (SEQ ID NO:67) 

M04D8.2 

PCR was performed exactly as for ZK84.N, except using primers: 
8.2F-HIN CCCAAGCTTCTAACCAACAAAAATGCACACTAC (SEQ ID NO:68) 
8.2R-XBA GCTCTAGACACGTGAACAATCTTTATCTTTAT (SEQ ID NO:69) 

C17C3.N 

PCR was performed exactly as for ZK84.N, except using primers: 
3.NF-HIN CCCAAGCTTCACAGCCAAAAACAAAAATGCAATC (SEQIDNO:70) 
3.NR-XBA GCTCTAGACACAGTATTTTAATGAAGGAGATC fSEQIDNO:71) 

T08G5.N 



PCR was performed exactly as for ZK84.N, except using 0.5 ul ( 35 ng) of template 
DNA and PCR primers: 

SL1-HIN C C C A A G CTTGGTTTA ATTAC CC AA GTTTG AG (SEQ ID NO: 144) 
30 G5.NR-XBA GCTCTAGATAATTCAATGAAAAGGCAAAACGACG (SEQ ID 
NO: 145') 

This reaction yielded four bands after one round of PCR. The cDNA was contained 
within an approximately 315 bp DNA fragment. Subcloning and sequencing of PCR 
products was performed exactly as for ZK75.1 except with the following sequencing 
35 primers: 
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pcDNA3.1BGH Reverse TAGAAGGCACAGTCGAGG (SEQ ID NO:146) 
T7 promoter primer TAATACGACTACTATAGGG (SEQ ID NO: 147) 

F41G3.N 

5 PGR was performed exactly as for T08G5.N. except using PCR primers: 

G3.NF-HIN CCCAAGCTTCTTCATTTGGGCTTCATTTTACCAC (SEQ ID NO: 148) 
G3.NR-XBA GCTCTAGAGAAACAATGTTTTTATTCAACATG (SEQ ID NO: 149) 

This reaction yielded a band of the expected size after one round of PCR. The PCR 
product was cloned into pcDNA3.1 and sequenced exactly as described for ZK75.1. 

10 

F41G3.N2 

PCR was performed exactly as for T08G5.N, except using PCR primers: 
G3.N2F-OUT CCCAAGCTTGGACTTTATCACAATTTCCAGCAC (SEQ ID NO: 154) 
G3.N2R-XBA GCTCTAGAGTTTCTAGATTTTTAGATTTCGTG (SEQ ID NO: 155) 
1 5 No band was visualized after the first round of PCR. A second PCR was performed as 
described above with the following changes: the template DNA was 1 of the first round 
PCR reacton, the reactions were run for 20 cycles only, and a different (nested) 3' primer 
was used. The primers weic. 

G3.N2F-XHO CCGCTCGAGATAATGAAGCTTCTTCTTCTCATTG (SEQ ID NO:156) 
20 G3.N2R-XBA GCTCTAGAGTTTCTAGATTTTTAGATTTCGTG (SEQ ID NO: 157) 

This reaction yielded a band of the expected size. The PCR product was subcloned 
into pcDNA3.1 and sequenced exactly as described for T085G.N, except the restriction 
enzymes used to digest the PCR product and vector were Xbal and Xhol. 

25 C17C3.N2 

PCR was performed exactly as for T08G5.N. except using PCR primers: 
C3.N2F-XH0 CCGCTCGAGCTCGACGTTCTTCAATCTATATTTC (SEQ ID NO: 150) 
C3.N2R-XBA GCTCTAGACAAACACCATTAAATCTGTATTTAAAC (SEQ ID 
NO:151) 

30 No band appeared after the first round of PCR. A second round of PCR was 

performed exactly as before using the following primers: 

C3N2F-XHO CCGCTCGAGCTCGACGTTCTTCAATCTATATTTC (SEQ ID NO: 1 64) 
C3.N2R-INN GCTCTAGAGTTCACAAATTCATTTTCAAATACG (SEQ ID NO: 165) 
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This reaction yield a single strong band of the expected size. The PCR product was 
subcloned into pcDNA3.1 and sequenced exactly as described for T08G5.N, except the 
restriction enzymes used to digest the PCR product and vector were Xbal and Xhol. 

5 Y52A1.N 

The template DNA source was mixed-stage C. elegans first-strand cDNA. poly-A 
selected and oligo-dT primed using the Gibco-BRL Superscript kit. The RNA was removed 
by RNAse digestion, and the cDNA was diluted with TE buffer and adjusted to a final 
concentration of approximately 70 ng/|il. 

10 The cDNA was generated by the polymerase chain reaction (PCR) procedure, using 

the Boehringer Mannheim Expand High Fidelity PCR System. Each reaction was 
performed in a total volume of 50 The components of the reaction were 0.5 ul (35ng) 
template DNA, 1 00 \iM each dNTP, 1 50 nM each primer as described below. 1 X buffer 
with MgCUas supplied by the manufacturer, and 1.3 units of enzyme. 

15 First, the template was denatured at 95 °C for 5:00 minutes, and stored on ice. The 

remainder of the reaction mixture was added, and the PCR reaction started as follows: 
95 X for 2:00 

35 cycles of: 95 C for 0:1 5 

54°C for 0:30 

20 72°Cfor 1:00 

For the first round of PCR, the primers were: 

SL 1 -HIN CCCAAGCTTGGTTTAATTACCC AAGTTTG AG (SEQ ID NO: 1 66) 
A1.1R-XBA GCTCTAGACAATTTTGATATTAAATTTTGTCG (SEQ ID NO:167) 
The first round of PCR yielded no detectable band as determined by agarose gel 
25 electrophoresis, staining with ethidium bromide, and visualization on a UV light box. 

A second round of PCR was performed as described above, with the following 
changes: the template DNA was 1 |il of the 1st round PCR reaction, the reactions were run 
for 20 cycles only, and a different (nested) 3' primer was used. The primers were: 
SL1-HIN CCCAAGCTTTGGTTTAATTACCCAAGTTTGAG (SEQ ID NO: 168) 
30 1.1R-INN GCTCTAGATAAATTTTGTCGATTTTCAAGTTG (SEQ ID NO: 169) 
This reaction yielded a strong single band of DNA at approximately 1.3 kb. 
To subclone the PCR product into a plasmid vector for DNA sequencing, we first 
isolated the PCR product by agarose gel electrophoresis (45|il of the second PCR reaction 
run on a 1 .2% gel). We excised the band with a razor blade, and purified the product from 
35 the gel using the Geneclean (Biol 01). We then ligated the PCR product into the plasmid 
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vector pCRII and transformed E. coli using the InVitrogen TA Cloning Kit. We screened 
bacterial colonies for the correct plasrnid by preparing mini-prep DNA (Biotechniques 8, 
172-3), and analyzed the mini-prep DNA by EcoRl restriction digest and agarose gel 
electrophoresis. 

5 We sequenced the subcloned PCR products by thermal cycling, using the Big Dye 

ready reaction mix sequencing kit. For each sequencing reaction, we added: approximately 
100 ng of mini-prep DNA; 0.8 pmol of sequencing primer; 1 \il 5X BigDye ready reaction 
buffer; 1.5 pi 80 mM Iris, 2 miM MgCl 2 , pH 9.0; and adjusted the volume to 10 pi with 
distilled water. The following sequencing primers were used: 
10 Ml 3 Forward GTTTTCCCAGTCACG (SEQ ID NO: 1 70) 
M13 Reverse CAGGAAACAGCTATGAC (SEQ ID NO:171 ) 

The sequencing reactions were thermal cycled using the following program: 
96° for 5:00 

25 cycles of: 96°C for 0:30 
15 50°Cfor0:15 

60 C C for 4:00 

We precipitated the cycled DNA with 75 pi 70% ethanol/ 5 mM MgCl 2 by 
incubating at room temperature for 20 minutes. We recovered the precipitated DNA by 
centrifugation at 15,000 X g for 30 minutes, removed the supernatant, and further dried the 

20 DNA pellet by vacuum centrifugation for 1 0 minutes. The sequencing reactions were 

analyzed and the DNA sequence determined by gel electrophoresis and fluorescent detection 
of sequencing products. The resulting DNA sequence for the Y52A1 -derived product 
indicated that there were in fact two opening reading frames in this cDNA. The open 
reading frame closest to the 5'-end of the message corresponding to this cDNA was not 

25 related to the insulin family. Instead, the insulin-like sequences predicted from the search of 
genomic DNA were found to correspond to the second open reading frame of this mRNA. 
Comparison of this Y52A1 -derived cDNA sequence with the genomic sequence suggested 
that the likely explanation for this configuration of two open reading frames was that they 
correspond to an operon where multiple mRNAs are derived from the same transcription 

30 unit through different patterns of trans-splicing (see Zorio et al., 1994, Operons as a 

common form of chromosomal organization in C. elegans, Nature 372, 270-272). Thus, it 
was assumed that the insulin-like open reading frame in the Y52A1 -derived product is 
actually translated from an mRNA that may be generated using an alternative trans-spliced 
leader such as SL2 or other leaders related to SL2. 
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PCR was used to amplify the presumptive insulin-like coding region from the larger 
cDNA product derived above. PCR was performed as above, with the following changes: 
the template was 1 jal of mini-prep DNA, and the following program was used: 
95°C for 2:00 
5 10 cycles of: 95°C for 0:30 

54°C for 0:30 
72 °C for 1:00 

The primers were: 

Y52A1-1 CCCAAGCTTGAGCATTTTGTTGCTCTGCAAAATG (SEQ ID NO: 172) 
10 1.1R-INN GCTCTAGATTAAATTTTGTCGATTTCAAGTTG (SEQ ID NO: 173) 
This reaction yielded a 268 bp product. 

To subclone the PCR product into a plasmid vector for DNA sequencing, we first 
purified the PCR product with the Geneclean kit (Biol 01), then digested the product with 
Hindlll and Xbal and isolated the PCR product by agarose gel electrophoresis (45 |il of the 
1 5 PCR reaction run on a 1 .2% gel). We excised the band with a razor blade, and purified the 
product from the gel using the Geneclean kit. We then ligated the cut PCR product into the 
plasmid vector pcDNA3.1 (InVitrogen) cut with Hindlll/Xbal and transformed E. coli. We 
screened bacterial colonies for the correct plasmid by preparing mini-prep DNA 
(Biotechniques 8, 1 72-3 ), and analyzed the mini-prep DNA by Pmel restriction digest and 
20 agarose gel electrophoresis. 

We sequenced the subcloned PCR products exactly as above, except with the 
following sequencing primers: 

pcDNA3.1BGH Reverse TAGAAGGCACAGTCGAGG (SEQ ID NO:174) 
T7 promoter primer TAATACGACTACTATAGGG (SEQ ID NO: 1 75 ) 

25 

ZC334.N2 

The cloning sites, Hindlll and Xbal were used for many of the cDN As except 
ZC334.N2, which has internal Hindlll and Xbal sites. The 5' primer contains a BamHl 
restriction site on the 5' end: CG-GGATCC-N=24; and the 3' primer contains an EcoRI site 
30 on the end: CG-GAATTC-N=25. 

The template DNA source was mixed stage C. elegems first strand cDNA, poly-A 
selected and oligo-dT primed using the Gibco-BRL Superscript kit. The RNA was removed 
by RNAse 

digestion, and the cDNA w r as diluted with TE buffer and adjusted to a final concentration of 
35 approximately 70 ng/pl. 
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The cDNA was generated by the polymerase chain reaction (PCR) procedure, using 
the Boehringer Mannheim Expand High Fidelity PCR System. Each reaction was 
performed in a total volume of 50 ul. The components of the reaction were 3 pi (210 ng) 
template DNA, 200 uM each dNTP, 300 nM each primer as described below, IX buffer 
5 with MgCUas supplied by the manufacturer, and 2.6 units of enzyme. 

The reaction mixture was assembled with the above components, except for the first 
strand cDNA template. The first strand cDNA template was added subsequently, and the 
PCR reaction started as follows: 
95 °C for 2:00 
10 35 cycles of: 95°Cfor0:15 

54 a C for 0:30 
72'Cfor 1:00 
For the first round of PCR, the primers were: 

R334N2-L1BAM CGGGATCCCCGCACAAACTTATATGACAACTC (SEQ ID NO: 1 76) 
15 R334N2-R1ECORI CGGAATTCGGTGTCTCATAATGGTAGTGGATAC (SEQ ID 
NO: 177) 

The first round of PCR yielded no detectable band as determined by agarose gel 
electrophoresis, staining with ethidium bromide, and visualization on a UV light box. 

A second round of PCR was performed as described above, with the following 
20 changes: the template DNA was 0.5 pi of the 1st round PCR reaction, and a different 
(nested) 3' primer was used. The primers were: 

R334N2-L1BAM CGGGATCCCCGCACAAACTTATATGACAACTC (SEQ ID NO: 178) 
R334N2-R2ECORI CGGAATTCGCAAAAGAGAGGTATAGGGATAAAG (SEQ ID 
NO: 179) 

25 This reaction yielded a strong single band of DNA at approximately 400 bp. 

To subclone the PCR product into a plasmid vector for DNA sequencing, we first 
purified the PCR reaction using the Promega Wizard PCR preps DNA purification system 
kit, according to the manufacturer's instructions, except the purified DNA was eluted from 
the column using 25 jil of distilled water. The purified DNA was digested with BamHI and 

30 EcoRI and the digested PCR product was isolated by agarose gel electrophoresis on a 1% 
agarose gel. The DNA product was eluted by electrophoresis into 1% low-melting 
temperature agarose. The product was purified from the gel by digestion of the low-melting 
temperature agarose with 5 units of B-agarase I (New England Biolabs) for 1 hour at 40 C in 
IX B-agarase buffer provided by the manufacturer, followed by precipitation of the DNA 

35 with 1/10 volumes of 3M sodium acetate, pH 5.2 and 2 volumes of isopropanol. Following 



- 54 - 



WO 99/54436 



PCT/US99/08522 



incubation of this mixture at -20°C for 30 minutes, the precipitated DNA was recovered by 
centrifugation at 13,500 X g for 15 minutes, the supernatant was removed, the DNA pellet 
was air-dried for 10 minutes and resuspended in 10-20 |il of distilled water. We then ligated 
the PCR product into the plasmid vector pcDNA3.1 (InVitrogen), cut with BamHI and 

5 EcoRI and transformed E. coli. We screened bacterial colonies for the correct plasmid by 
preparing mini-prep DNA using the Primm Labs Mini-Prep kit, and analyzed the mini-prep 
DNA by BamHI and EcoRI restriction digestion and agarose gel electrophoresis. 

We sequenced the subcloned PCR products by thermal cycling, using the Big Dye 
ready reaction mix sequencing kit. For each sequencing reaction, we added: approximately 

10 100-200 ng of mini-prep DNA; 0.8 pmol of sequencing primer; 1 \x\ IX BigDye ready 
reaction buffer (80 mM Tris, 2 mM MgCU, pH 9.0) and adjusted the volume to 5 ^1 with 
distilled water. The following sequencing primers were used: 
pcDNA3.1BGH Reverse TAGAAGGCACAGTCGAGG (SEQ ID NO: 180) 
T7 promoter primer TAATACGACTACTATAGGG (SEQ ID NO: 1 8 1 ) 

1 5 The sequencing reactions were thermal cycled using the following program: 

96° for 4:00 

25 cycles of: 96°C for 0:30 

50°C for 0:1 5 
60°C for 4:00 

20 We purified the cycled DNA by centrifugation through Centriflex gel filtration 

cartridge spin columns (Edge Biosystems), according to the manufacturer's instructions. 
The purified DNA was dried by vacuum centrifugation for 30 minutes. The sequencing 
reactions were analyzed and the DNA sequence determined by gel electrophoresis and 
fluorescent detection of sequencing products. 

25 

ZC334.N3 

The first round PCR was performed exactly as ZC334.N2, except the 5' primer 
contains an Hindlll site, and the 3' primer contains and Xbal site, as the Y52A1.N primers. 
First round primers: 

30 334N3-LIH3 CCCAAGCTTAAAGGCTTAGATGCAGAAAGACC (SEQ ID NO: 182) 
334N3-RXBA GCTCTAGAGGGATTAAAATCACTCTGTGATTAAG (SEQ ID NO: 183) 

The first round of PCR yielded no detectable band as determined by agarose gel 
electrophoresis, staining with ethidium bromide, and visualization on a UV light box. 

A second round of PCR was performed as described above; a different (nested) 5' 
35 primer was used. The primers were: 
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334N3-L2H3 CCCAAGCTTTAAAGGTGGACATTGTAGAAGGTTG (SEQ ID NO: 184) 
334N3-RXBA GCTCTAGAGGGATTAAAATCACTCTGTGATTAAG (SEQ ID NO: 185) 
This reaction yielded several different sized DNA products, including a strong band 
of DNA at the predicted size of approximately 350 bp. This 350 bp product was subcloned 
5 and sequenced exactly as described for ZC334.N2. 

ZC334.N4 

The first round PCR was performed exactly as ZC334.N2. Primers contain Hindlll 
and Xbal sites as ZC334.N3. First round primers: 
1 0 R334N4-LIH3 CCC AAGCTTCCTTCACTTCTCAGCGAAGGAAATG (SEQ ID NO: 1 86) 
R334N4-RXBA GCTCAGAGTGCTCATGCTCCGTTATTTGTGC (SEQ ID NO: 187) 

This reaction yielded a strong single band of DNA at approximately 380 bp after one 
round of PCR. This product was subcloned and sequenced exactly as described for 
ZC334.N2. 

15 

ZC334.N5 

The first round PCR was performed exactly as ZC334.N2. The 5' primer contains a 
EcoRI restriction site on the 5' end. i.e. CG-GAATTC-N=26; and the 3' primer contains an 
Xhol site on the end, i.e. CCG-CTCGAG-N=24 for cloning; the Hindlll and Xbal sites. 
20 which were used as cloning sites for many of the cDNAs, were not used in this case since 
ZC334.N5 has both internal Hindlll and Xbal sites. First round primers: 
R334N5-L1ECORI CGGAATTCCTAGAATTTTCACCCCAAATGTTCAG (SEQ ID 
NO:188) 

R334N5-RXHO CCGCTCGAGAAATGTAAGTGATTGGCAAGTTGG (SEQ ID NO: 189) 
25 This reaction yielded a strong single band of DNA at approximately 300 bp after one 

round of PCR. This product was subcloned and sequenced exactly as described for 
ZC334.N2. 

ZC334.N6 

30 The first round PCR was performed exactly as ZC334.N2. Primers contain Hindlll 

and Xbal sites as ZC334.N3. First round primers: 

334N6-L1H3 CCCAAGCTTAGAGACTTAGACGCAAAGAGGACC (SEQ ID NO: 190) 
334N6-RXBA GCTCTAGAGCAGGAAAATTAGCTAAAACATAATG (SEQ ID 
NO:191) 



- 56- 



WO 99/54436 



PCT/US99/08522 



The first round of PCR yielded no detectable band as determined by agarose gel 
electrophoresis, staining with ethidium bromide, and visualization on a UV light box. 

A second round of PCR was performed using the same two primers that were used 
in the ZC334.N6 first round reaction, as described above. This reaction yielded several 
5 products, including a strong band of DNA at the predicted size of approximately 450 bp. 

This 450 bp product was subcloned and sequenced exactly as described for 
ZC334.N2. 

ZC334.N7 

10 The first round PCR was performed exactly as ZC334.N2. The 5' primer contains a 

EcoRI restriction site on the 5' end, i.e. CG-GAATTC-N=24; and the 3' primer contains an 
Xhol site on the end, i.e. CCG-CTCGAG-N=25 for cloning; the Hindlll and Xbal sites, 
which were used as cloning sites for many of the cDNAs, were not used in this case since 
ZC334.N7 has both internal Hindlll and Xbal sites. First round primers: 

15 R334N7-L1 ECORI CGGAATTCGGCGAAACACTTCCGCCAACTCAC (SEQ ID 
NO: 192) 

R334N7-R1XHO CCGCTCGAGACCTACCTCAACTTGGAGGATAAC (SEQ ID 
MO: 193) 

The first round of PCR yielded no detectable band as determined by agarose gel 
20 electrophoresis, staining with ethidium bromide, and visualization on a UV light box. 

A second round of PCR was performed using the same two primers that were used 
in the ZC334.N7 first round reaction, as described above. This reaction yielded several 
products, including a band of DNA at the predicted size of approximately 650 bp. This 650 
bp product was subcloned and sequenced exactly as described for ZC334.N2. 

25 

T10D4.N 

The first round PCR was performed exactly as ZC334.N2. Primers contain Hindlll 
and Xbal sites as ZC334.N3. First round primers: 

D4N-L2H3 CCCAAGCTTCCTTGCACCTGCCTTCAACCATCAC (SEQ ID NO: 194) 
30 D4N-RXBA GCTCTAGATATTCTGACCCCAAAATGACAATC (SEQ ID NO: 195) 

This reaction yielded a single band of DNA at approximately 700 bp after one round 
of PCR. This product was subcloned and sequenced exactly as described for ZC334.N2. 



35 
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T10D4.N2 

The first round PCR was performed exactly as ZC334.N2. Primers contain Hindlll 
and Xbal sites as ZC334.N3. First round primers: 

RD4N2-L1H3 CCCAAGCTTTTCTGCAGACTTGCAAGGTTAGTTC (SEQ IDNO:196) 
5 RD4N2-R1XBA GCTCTAGAATTCACAAAATAATCAAGACAATC (SEQ ID NO: 197) 

The first round of PCR yielded no detectable band as determined by agarose gel 
electrophoresis, staining with ethidium bromide, and visualization on a UV light box. 

A second round of PCR was performed using the same two primers that were used 
in the T10D4.N2 first round reaction, as described above. This reaction yielded a strong 
band of DNA at approximately 400 bp. This product was subcloned and sequenced exactly 
as described for ZC334.N2. 



15 



20 



25 



30 



35 



EXAMPLE 2: EXPRESSION ANALYSIS 

Analysis of expression patterns of C. clegans insulin-like genes was carried out by 
fusing the transcriptional control regions identified for each gene to a reporter gene 
encoding green fluorescent protein (GFP), a protein whose expression is easily detected by 
its fluorescence in vivo. Each reporter gene so constructed was then expressed as a 
transgene in transgenic nematodes. Table 2 entitied "Expression Data" bets forth the 

For each C. elegans insulin-like gene, putative promoter/enhancer regions were 
identified in the adjacent genomic sequence (GenBank®, C. elegans Genome Project) as 
regions extending from the predicted start codon of each insulin-like gene to the next gene 
upstream, identified using the GeneFinder program. If the putative promoter/enhancer 
region was 6 kilobase pairs (kbp) or less in size, synthetic oligonucleotide primers were 
designed to amplify the entire region by PCR. For F13B12.N, ZK75.2 and M04D8.K and 
the putative promoter/enhancer region was more than 6 kbp or was unbounded (see Table 2) 
by a clearly-defined upstream gene. In these instances, a 2 to 6 kbp segment of upstream 
region was arbitrarily chosen for amplification, based on available genomic sequence 
information and favorable primer annealing sites. In addition to the gene-specific sequences 
incorporated into the PCR primers, each primer also contained restriction enzyme cleavage 
sites to allow easy insertion into the GFP reporter vector system (pPDl 17.01): Asc I 
cleavage sites where incorporated in primers positioned upstream of each 
enhancer/promoter region, and cither Age I or Kpn I sites incorporated into each primer 
position downstream of the promoter/enhancer. The specific primer pair sequences used to 
amplify the promoter/enhancer regions of each gene are listed below. 
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List of primers for promoter/enhancer amplification 

Gene (PCR product size in kbp) 
Sense and antisense primers 

F13B12.N (5.2) 

5 TTGGGCGCGCCGTCTTGCATGCAGTTGTCACG (SEQIDNO:72) 
CCAACCGGTATCATTGCGTACTGTCGTAGCGTGTG (SEQ IDNO:73) 
ZK75.2 (3.7) 

TTGGGCGCGCCTGCTACCGTGGGAATTTTACAAG (SEQ IDNO:74> 
CCAACCGGTATCATGGTAGATTTTAGAATGGAAAG (SEQ ID NO:75) 
10 ZK75.3 (5.7) 

TTGGGCGCGCCGGAGTTCATCTGGAGGTCACATC ( SEQ ID NO:76) 
CCAACCGGTATCATTATTCAGAACAGGAATTGATAAATG (SEQ ID NO:77) 
ZK75.1 (5.7) 

TTGGGCGCCAGATAAATACAGAATGGGCGGAG (SEQ ID NO:78) 
15 CCAACCGGTATCATTCTCTTGGAGCTTTTGAAAAAC (SEQIDNO:79) 
ZK84.N2 (1.7) 

TTGGGCGCGCCAGTCGTCCAACAAGCCATCTCC (SEQ ID NO:80) 
CCAACCGGTTGCATTTTCCTTGAAGATTGAAG (SEQ ID NO:81 ) 
ZK84.6 (3.7) 

20 TTGGGCGCGCCTAGATTTTCTCCATTCACAAAC (SEQ ID NO: 82) 
CCAACCGGTATCATTATAATGATATGGATAACGG (SEQ 1DN0:83) 
ZK.1251.2 (0.6) 

TTGGGCGCGCCAATCGTTTTCATCATTTTGCTTC (SEQ ID NO: 84) 
CCAACCGGTATCATCTGGAAAAGTAATATTATAT (SEQ ID NO:85 ) 
25 ZK1251.N (1.3) 

TTGGGCGCGCCTGAAATCTTTATATCCTCTTCAC (SEQ ID NO:86) 
CCAACCGGTATCATCTGGAAATAATTAATATCAG (SEQ ID NO:87) 
C06E2.N (3.0) 

TTGGGCGCGCCTAACACGTGCATTGGAGGCGGAG (SEQ IDNO:88) 
30 CCAACGGTATCATCGTTTCACTCCTCGAATTATTTG (SEQ ID NO:89) 
C17C3.N (2.3) 

TTGGGCGCGCCATTGGTATCACAAGGATCAAGC (SEQ ID NO:90) 
CCAACCGGCATTTTTGTTTTTGGCTGTGATTA (SEQ ID NO:91) 
C17C3.4 (1.4) 

35 TTGGGCGCGCCAATTTTGACGACGATCTCCTTC (SEQ ID NO:92) 
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CCAACCGGTATCATATTTAACGATTCCTACACAAACC (SEQ IDNO:93) 
ZK84.N (2.1) 

TTGGGCGCGCCGTGTGGAGGTGGTGAATCC (SEQ ID NO:94) 
CGGGGTACCCTCATTTCAAAGAAATGTTGAATA (SEQ ID NO:95 ) 
5 M04D8.1 (3.0) 

TTGGGCGCGCCGGAGCCGAACAAGAAAAACCTAC (SEQ ID NO:96) 
CCAACCGGTTTCATGGTTCAACTCAAAAAGGAA (SEQ IDNO:97) 
M04D8.2 (2.2) 

TTGGGCGCGCCAGTTCGTCTCAGCATCATCTTGC (SEQ ID NO:98) 
10 CCAACCGGTTTCATGGTTCAACTCAAAAAGGAA < SEQ ID NO:99) 
M04D8.3 (1.6) 

TTGGGCGCGCCATGGGATTTTCAGACTCTCAG (SEQ ID NO: 100) 
CCAACCGGTAACATTATCGAAATTCCAGAAATCCG (SEQ ID NO:101) 

The following PCR conditions were used: 95 °C for 2 min; either 15 cycles (genomic 

15 DNA templates) or 10 cycles (cosmid DNA templates) of the following steps, (1 ) 95 °C for 
15 sec, (2) 50 °C for 30 sec, and (3) 68 °C for a time equivalent to 1 min per kbp of expected 
product length, and 1 0 additional cycles with 20 sec added per cycle at step (3). N2 
genomic DNA was used as template, except for ZK75.2, ZFC75.3, ZK75.1, and ZK84.6. for 
which cosmid DNA was used. The PCR products were digested with either Ascl-Agel or 

20 Ascl-Kpnl, ligated into similarly-digested PPD1 1 7.01 GFP fusion vector, and transformed 
into E. coli. DNA from the resulting clones was prepared using a Qiagen kit, and the 
correct structure and reading frame of fusion between promoter region and GFP coding 
region was checked by DNA sequencing. 

25 GFP fusion construct injection 

Each GFP fusion construct was injected into wild type worms using a standard 
protocol for C. elegans transformation (see Mello et al., 1991, "Efficient gene transfer in C. 
elegans: extrachromosomal maintenance and integration of transforming sequences", 
EMBO J. 10:3959-3970) at a concentration of 100 jug/ml each GFP fusion plasmid plus 100 

30 |ig/ml pRF4 rol-6(d) transformation marker. Stably transformed strains exhibiting a Roller 
phenotype were established and examined for fluorescence by inspection using an Axioplan 
microscope (Zeiss). For each GFP fusion construct, two transformant lines which exhibited 
the highest levels of fluorescence were chosen for further analysis. Duplicate constructs 
were analyzed for all promoter/enhancer region-GFP fusions, and the patterns of GFP 

35 expression were found to be identical for all duplicates (see Table 2). Duplicate constructs 
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were derived from independent PCR reactions for all genes except ZK75.1, ZK1251.N, and 
C06E2.N. 

Structural categories of genes 

5 Comparison of the predicted coding regions of C. elegans insulin-like genes reveals 

a remarkable and unexpected diversity of structures, which are nonetheless clear variations 
on the common theme that characterizes the insulin superfamily. Structural domains within 
each predicted C. elegans insulin-like protein are annotated in the sequences set forth in 
FIG. 4 through FIG. 34. In FIG. 3, the sequences of predicted mature forms of the proteins 

10 are aligned to one another to highlight features that tend to be conserved compared with the 
insulin superfamily, as well as to emphasize features that distinguish different Classes of C 
elegans insulin-like proteins. 

We have divided the currently-characterized C. elegans insulin-like genes into four 
Classes based on the protein primary structural characteristics as set forth below. 

15 

CLASS I: One C. elegans insulin-like gene, F13B12.N has been assigned to Class I. Class 
I is characterized as having a cleavable C peptide separating the B and A chains. This C 
peptide possesses processing sites for prohormone convertases, similar to that of vertebrate 
insulin. Ends generated by proteolytic removal of the C peptide are indicated by the 
20 symbols "«" and "» M in FIG. 3 for the B and A peptides. Further, Class I is characterized 
as having an extra pair of Cys residues present which is not found in vertebrate insulins. 
One Cys residue is located in the B chain and the other Cys residue is located in the A chain. 
This unique extra pair of Cys residues presumably form an extra inter-chain disulfide bond. 

25 CLASS II: Nine C elegans insulin-like genes, ZK75.1, ZK75.2, ZK75.3, ZK84.6, 

ZK84.N2, ZK125L2, ZK1251.N, C06E2.N and TO85G.N have been assigned to Class II. 
Class II is characterized by the absence of a C peptide. Further, Class II is characterized as 
having an extra pair of Cys residues. 

Still further, Class II is characterized as having a "Pro peptide," which is presumably 
30 removed by proteolytic processing from the mature hormone. This Pro peptide is located 
between the signal sequence and the beginning of the B domain (i.e.. similar to the Pro 
peptide of locust LIRP insulin-like protein). The B and A regions or domains presumably 
are not cleaved into separate chains in this Class II and the following Classes III-IV. 

35 
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T08G5.N is unique in that there is a repositioning of one of the Cys residues in the 
B domain. In this case, the second Cys residue appears to be moved by four amino acid 
residues from the end of the presumptive central helix of the B domain towards the middle 
of the central helix. The repositioning places the Cys residue such that it would project 

5 from the same side of the presumptive B domain helix and remain available for disulfide 
bond formation with the normal partner Cys residue at the end of the second helix of the A 
domain. Although the spacing of Cys residues in the B domain is unique to insulin-like 
protein T08G5.N, it is anticipated that this Cys residue repositioning can be accommodated 
with relatively small changes in the tertiary structure typical of the insulin superfamily, and 

10 no significant changes in secondary structure motifs. 



CLASS III: Ten C elegans insulin-like genes, C17C3.4, C17C3.N, C17C3.N2, F41G3.N, 
F41G3.N2, F56F3.6. Y52A1 .N, T28B8.N, T10D4.N and T10D4.N2, have been assigned to 
Class III. Class III is characterized by the absence of a C peptide. Further, Class III is 

1 5 characterized as having the same number of Cys residues in the B and A domains as found 
in vertebrate insulin. Some members of this Class lack an intron positioned between the B 
and A domains within the genomic sequence. FIG. 3 denotes the lack of an intron in this 
position by the symbol " " at the C-terminus of the B domain and N-terminus of the A 
domain for C17C3.N2, F41G3.N2, and F56F3.6, and the most N-terminal of the three 

20 insulin-like modules of T10D4.N, designated as T10D4.Na, as indicated in FIG. 3. 

CLASS IV: Eleven C. elegans insulin-like genes, M04D8.1, M04D8.2, M04D8.3, 
ZK84.N, ZC334.N, ZC334.N2, ZC334.N3, ZC334.N4, ZC334.N5, ZC334.N6 and 
ZC334.N7, have been assigned to Class IV. Class IV is characterized by the absence of a C 
25 peptide. Further, Class IV is characterized as having an extra pair of Cys residues, as in 
Classes I and II. Still further, Class IV is characterized by the absence of a Cys pair in the A 
domain; the missing Cys pair in most cases is replaced by hydrophobic residues. 



Structural comparison with known genes 

30 With respect to the well-characterized structures of previously-known insulin 

superfamily proteins, each of the C. elegans insulin-like proteins identified herein has at 
least one novel and significant structural feature which is not typical of the previously- 
characterized insulin superfamily proteins. These features include: absence of a C peptide; 
presence of an extra inter-chain Cys pair; absence of a Cys pair in the A chain domain; 

35 altered spacing of Cys residues; and/or multiple B domain and A domain pairs in the same 
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polypeptide. However, these primary structural differences can be accommodated within 
the overall secondary and tertiary structural framework that is common to the insulin 
superfamily, as described below. 

5 Peptide domains 

Only one of the C. elegans insulin-like genes possesses a "connecting" or C peptide 
between the A and B chain domains (i.e., F13B12.N, Class I). Since the C-terminus of the 
B chain and the C-terminus of the A chain are relatively close in space within the tertiary 
structure of insulin, it is quite possible that a continuous main chain could connect 

10 presumptive B and A domains without grossly disturbing the overall insulin fold. There is 
an intriguing aspect of the gene organization of the C. elegans insulin-like genes that 
supports the notion of structural motifs corresponding to the B and A peptides of the insulin 
superfamily, despite the lack of a C peptide. All C elegans insulin-like genes have introns, 
and nearly all genes encoding proteins that lack an identifiable C peptide (Classes II through 

15 IV) have an intron positioned between the B domain and A domain as indicated in FIG. 3 
(the only exceptions are F56F3.6. C17C3.N2, F41G3.N2, and the most N-terminal insulin- 
like module of T10D4.N indicated as T10D4.Na). Indeed, even the Class I C. elegans 
insulin-like gene, which has a C peptide, also has an intron positioned at the boundary of the 
B and C peptides. In vertebrates, the most common exon-intron structure of insulin-like 

20 genes is that with an intron position either at the boundary or within the C peptide coding 
region. 

One of the C. elegans insulin-like genes, T10D4.N, is especially remarkable in terms 
of domain organization as this gene encodes a single polypeptide which possesses three 
tandem pairs of B and A domains, or insulin-like "modules 1 ', in effect producing a trimeric 

25 insulin. Multiple insulin-like modules within the same polypeptide have not been observed 
previously in any organism. The sequences of the three insulin-like modules within the 
T10D4.N polypeptide are labeled in FIG. 3 as T10D4.Na, T10D4.Nb, and T10D4.Nc, 
extending in order from the N-terminus to the C-terminus of the polypeptide. The symbol "- 
" at the C-terminus of sequences for modules T10D4.Na and T10D4.Nb signifies that the 

30 polypeptide sequence continues with the first residue of the sequence in the line below. It is 
noteworthy that the tandem insulin-like modules in T10D4.N are connected by hydrophobic 
spacers at the end of the A domain of each module Tl OD4.Na and Tl OD4.Nb. Further, the 
C-terminal module T10D4.Nc contains a tail extending the end of the A domain of the 
same length and hydrophobic character as the connecting spacer regions. It is also 

35 intriguing that immediately adjacent to the T10D4.N gene within genomic DNA is another 
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insulin-like gene, T10D4.N2, oriented in the opposite direction which consists of the 
typical single insulin module. T10D4.N2 is very closely related in primary sequence to the 
individual modules that comprise T10D4.N (see sequence alignments in FIG. 3) and also 
possesses the tail extending at the end of the A domain that is similar in size and character 
5 to the tail and connecting spacers in the trimeric T10D4.N. 

CYS Residues 

Most C. elegans insulin-like proteins possess an extra pair of Cys residues (Classes 
L II and IV) and it is striking that there is a consistent spatial positioning of them (see the 

1 0 alignment of FIG. 3). One extra Cys is found toward the C-terminal end of the B chain (/. e. , 
B region or domain) and the other extra Cys is found toward the C-terminal end of the A 
chain (i.e., A region or domain). These two positions are expected to be very close in space 
within the known tertiary structure of insulin superfamily proteins. Thus, it is quite possible 
that the extra Cys residues in the C. elegans insulin-like proteins form a disulfide bond that 

1 5 further stabilizes the structure. This situation is reminiscent of that previously noted for 
extra Cys residues within the MIP family of insulin-like proteins from freshwater snail. 
However, in the case of the MIP proteins, the extra Cys residues are positioned at the N- 

-»- ^ 1 — „ 1 ». > ^ , . rt.-P+K,^ A nii/-I ID pVkiimc / c n /> \^ T C\ "~) \ 
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Some C elegans insulin-like proteins (i.e. Class IV) are missing a pair of Cys 

20 residues in the A domain that are invariably found in the previously-characterized insulin 
superfamily members and which form an intra-chain disulfide bond that stabilizes a bend in 
the A chain structure. It is notable that, in many of the C. elegans Class IV proteins, there 
appears to be a concerted replacement of these two Cys residues with either aromatic or 
aliphatic residues. Such substitutions are consistent with the normal placement of this 

25 disulfide linkage within the hydrophobic core between the A and B chains. It seems that in 
these C. elegans Class IV insulin-like proteins, a strong covalent linkage has been 
substituted with a weaker stacking or hydrophobic interaction between side chains in these 
positions. It is relevant that all C. elegans insulin-like proteins that are "missing" a pair of 
Cys residues within the A domain also have an "extra pair" of Cys residues at the ends of 

30 the B and A domains, as described above. 

Several C. elegans insulin-like proteins are highly unusual by virtue of having an 
abnormal spacing between conserved Cys positions (T08G5.N, Y52A1.N, F56F3.6, 
T28B8.N, T10D4.N, T10D4.N2 and ZC334.N. see FIG. 3). Nonetheless, as indicated in the 
sequence alignment of FIG. 3, the changes in spacing can be viewed as relatively small 

35 alterations which are not expected to cause large-scale changes in structure that would 
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deviate from that typical of the insulin superfamily. The "repositioning" of one Cys residue 
within the B domain of T08G5.N was discussed previously. For other insulin-like genes 
with altered spacing of Cys residues, the changes in spacing can be viewed as small 
insertions or deletions within structural transitions of the typical insulin fold. Thus, 

5 Y52A1 .N can be viewed as having a deletion of three residues (symbolized by M — " in FIG. 
3) that shortens the loop connecting the two helices of the A domain. Conversely, ZC334.N 
and insulin-like modules T10D4.Nb and T10D4.Nc of T10D4.N can all be viewed as 
having an insertion of a dipeptide of either "Ser Gly", "Pro Glu", or "Ser Ala", respectively, 
within the loop connecting the two helices of the A domain. Also, Tl OD4.N2 and modules 

10 T10D4.Na, T10D4.Nb, and T10D4.Nc of T10D4.N can each be viewed as having an 
insertion of a single residue, either "He", "Phe", "Val", or "Vat", respectively, at the end of 
the second helix of the A domain. Finally, F56F3.6 and T28B8.N can be viewed as having 
an insertion of a tripeptide having the sequence "Pro Pro Gly" within the turn that 
immediately precedes central helix of the B domain. It is particularly intriguing that the 

15 presence of both insertions and deletions of this sort within the C elegans insulin-like 
proteins points to an ability to accommodate more variation within the insulin protein 
structure than had been appreciated from sequences of previously described insulin 

cimprfamilv nrntpinc 

j ^ 

20 EXAMPLE 3: GENERATION AND GENETIC ANALYSIS OF NEMATODES 
WITH ALTERED INSULIN-LIKE GENES 

C. elegans insulin-like genes are important tools for creating genetically-engineered 
nematodes. Genetically-engineered nematodes may harbor: (a) deletions or insertions in an 
insulin-like gene or genes; (b) interfering RNAs derived from such genes; (c) and/or 

25 transgenes for mis-expression of wild-type or mutant forms of such genes. Such C elegans 
strains with laboratory-generated alterations in insulin-like genes are useful for many 
purposes. Examples of such purposes include: (a) identification of insulin-like genes that 
participate in biochemical and/or genetic pathways that constitute possible pesticide targets, 
as judged by phenotypes such as non- viability, block of normal development, defective 

30 feeding, defective movement, or defective reproduction; (b) identification of insulin-like 
genes that participate in genetic and/or biochemical pathways that relate to therapeutic 
applications associated with the insulin superfamily hormones, such as metabolic control, 
growth regulation, differentiation, reproduction, and aging, through the generation of 
phenotypes associated with those functions in the altered C. elegans strains; and (c) as 

35 substrates for large-scale genetic modifier screens aimed at systematic identification of other 
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components of these genetic and/or biochemical pathways that serve as novel drug targets, 
diagnostics, prognostics, therapeutic proteins, pesticide targets or protein pesticides. 

Methods for creation and analysis of C. elegcms strains having modified expression 
of insulin-like genes are described below. Expression modification methods include any 
5 method known to one skilled in the art. Specific examples include but are not limited to 
EMS chemical mutagenesis, Tel transposon mutagenesis, double-stranded RNA 
interference, and transgene-mediated mis-expression. In the creation of transgenic animals, 
it is preferred that heterologous (i.e., non-native) promoters be used to drive transgene 
expression. 

10 

EXAMPLE 4: EMS CHEMICAL DELETION MUTAGENESIS 

Ethyl methanesulfonate (EMS) is a commonly-used chemical mutagen for creating 
loss-of-function mutations in genes-of-interest in C elegans. Approximately 13% of 
mutations induced by EMS are small deletions. With the methods described herein, there is 

15 approximately a 95% probability of identifying a deletion-of-interest by screening 4 x 10 6 
EMS-mutagenized genomes. Briefly, this procedure involves creating a library of several 
million mutagenized C elegans which are distributed in small pools in 96-well plates, each 
pool composed of approximately 400 haploid genomes. A portion of each pool is used to 
generate a corresponding library of genomic DNA derived from the mutagenized 

20 nematodes. The DNA library is screened with a PCR assay to identify pools that carry 
genomes with deletions-of-interest, and mutant worms carrying the desired deletions are 
recovered from the corresponding pools of the mutagenized animals. Although EMS is a 
preferred mutagen to generate deletions, other mutagens can be used that also provide a 
significant yield of deletions, such as X-rays, gamma-rays, diepoxybutane, formaldehyde 

25 and trimethylpsoralen with ultraviolet light. 

Nematodes may be mutagenized with EMS using any procedure known to one 
skilled in the art, such as the procedure described by Sulston and Hodgkin (1988, Methods, 
pp. 587-606, in The nematode Caenorhahditis elegans. Wood, Ed., Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, New York). Following exposure to the mutagen, 

30 nematodes are dispensed into petri dishes, incubated one to two days, and embryos isolated 
by hypochlorite treatment (Id.) Embryos are allowed to hatch and LI larvae are collected 
following overnight incubation. The larvae are distributed in petri plates at an average 
density of 200 animals per plate and incubated for 5 to 7 days until just starved. A sample 
of nematodes is collected from each plate by washing with a solution of distilled water, and 

35 the nematodes washed from each plate are placed in one well of a 96-well plate. Worms are 
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lysed by addition of an equal volume of lysis buffer (100 mM KG, 20 mM Tris-HCl pH 
8.3, 5 mM MgCF, 0.9% Nonidet P-40, 0.9% Tween-20, 0.02% gelatin, and 400 ng/ml 
proteinase K) followed by incubation at -80°C for 15 minutes, 60 n C for 3 hours, and 95 °C 
for 15-30 minutes. The DNA-containing lysates are kept by storage of plates at -80°C until 
5 analyzed further. Live nematodes from each plate are aliquoted into tubes within racks for 
storage at -80 °C, such that the physical arrangement of tubes of live animals is the same as 
the arrangement of corresponding DNA lysates in the 96-well plates. 

A pooling strategy is used to allow efficient PCR screening of the DNA lysates. The 
pools are made from each 96-well plate by mixing 10 \i\ of lysate from 8 wells comprising 
10 each column of wells in a plate. The pooled lysates for each column are used for screening 
with PCR. PCR primers are designed for each locus-of-interest to be about 1.5 to 12 kb 
apart, depending on the size of the locus, such that deletions encompassing the entire coding 
regions of insulin-like genes can be detected following a previously-described procedure 
{see Plasterk, 1995, Reverse genetics: from gene sequence to mutant worm, Methods in Cell 
15 Biology 48:59-80). For each region, two sets of primer pairs are chosen for carrying out a 
nested PCR strategy such that an outside set is used for the first round of PCR and an inside 
set is used for the second round of PCR. The second round of PCR is perfonned to achieve 
greater specificity in the reaction. 

The first round PCR reactions are perfonned in duplicate for each pool with 
20 reactions carried out in a 96-well plate. Each reaction contains 1 8 fil of the following 
mixture and 2 [x\ of each pooled lysate: 

reaction buffer provided by the manufacturer (e.g., Boehringer Mannheim 
Biochemicals) 
2.5 mM MgCU 
25 0.2 mM each dNTP 

0.5 jaM each gene-specific primer 

1.7 units Expand Hi Fidelity enzyme mix (Boehringer Mannheim 
Biochemicals) 

to 18 |al per reaction with dH 2 0 
30 The reactions are carried out using the same general temperature cycling parameters 

except that the extension time is varied depending on the normal distance between the 
primer pairs as follows: 

4 kb wild-type product or shorter: 1 minute extension time 
4-6 kb wild-type product: 2 minute extension time 
35 6-12 kb wild-type product: 4 minute extension time 
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The temperature cycling conditions used are 94°C for 3 minutes, then 35 cycles of 
the following: 94°C for 40 seconds, 55°C for 1 minute, and 72°C for the number of 
minutes of extension time described above. 

The second round of PCR is performed essentially as above, except that 15 jjlI of 
5 mixture containing the following was aliquoted to each reaction: 
reaction buffer provided by the manufacturer 
1.5 mM MgCl 2 
0.2 mM each dNTP 
0.5 |iM each gene-specific primer 
10 1 .7 units of Expand Hi fidelity enzyme mix 

to 1 5 |il per reaction with dH 2 0 
A small amount of first-round reaction products is transferred to the second-round 
reaction mixtures using a 96-pin replicator. The same temperature cycling sequence is used 
for the second round as described for the first round. 
15 Products of the second round of PCR may be analyzed by electrophoresis in 1% 

agarose gels. If a potential deletion product is observed in at least one of the two reactions, 
two rounds of PCR are performed as described above on lysates from each individual well 
derived from the column corresponding to the positive pool. This results in the 
identification of a positive "address," i.e., a specific well within an individual plate, 
20 containing a deletion mutant. The positive address is re-tested in quadruplicate using two 
rounds of PCR as described above, and the product is gel purified and sequenced directly to 
confirm the presence of the desired deletion. 

For example, two deletions have been identified and characterized by DNA 
sequencing, using the procedures described above, that remove the C elegans insulin-like 
25 geneZK75.1. 

Once a positive address has been identified and confirmed by sequence analysis, 
approximately 300 individual worms from the relevant plate are cloned onto separate, fresh 
plates. When Fl animals are present on the plate, the parent nematodes are placed into 
buffer and lysed as described above. The same primer pairs and cycling conditions used to 
30 identify the deletion are used to perform PCR on these animals. Once a single animal 
carrying the deletion has been identified, its progeny are cloned and examined using the 
same conditions described above, until a homozygous population of deletion animals is 
obtained. 

Detailed protocols which may be used for EMS mutagenesis of the genes identified 
35 herein are set forth below. 



-69- 



WO 99/54436 



PCT/US99/08522 



Mutagenizing nematodes 

Plates crowded with L4 hermaphrodite worms are washed off with M9 buffer into 
1 5 ml tubes and centrifuged. The worms are washed 2X with M9 buffer and resuspended in 
9 ml of M9 buffer and transferred to a 50 ml tube. 

5 In a chemical fume hood, 1 ml of M9 buffer and 62 ^1 of EMS are added to a 

microfuge tube. Close tube and shake to mix M9 and EMS. The EMS/M9 mixture is then 
added to the 9 ml of worms. This is a concentration of 50 mM EMS in 10 ml of worms in 
suspension. Rotate suspension on a rotation device (e.g., Nutator) for 4 hours. After the 
incubation, wash worms with M9 buffer 3X. 

10 Plate animals to plates with thick lawns of bacteria and place them at 20 °C for 

about 24 hours until they become full of eggs as adults. Hypochlorite treat worms to kill 
adults and isolate embryos (see below). 

Isolating worm embryos 

1 5 The following protocol may be used to isolate mutagenized worm embryos 

following the above EMS chemical treatment: 

1 . wash worms off plates into a 1 5 ml tube in a total of 1 5 ml sterile water 

2. spin down worms 30 sec at about I5K rpm and wash IX in water 

3. rinse worms briefly in 4 ml hypochlorite solution (6.6 ml water, 400 \x\ 5 M KOH. 1 ml 
20 1 0% Na hypochlorite) and spin down 

5. add remaining 4 ml hypochlorite solution and transfer a drop to a watch glass to observe 

the reaction under a dissecting microscope 

6. as soon as adults start to burst at vulva and release embryos, adults are broken open by 

passage through a 21 gauge needle 2-3X 
25 7. quickly fill tube with M9 buffer and spin down eggs 

8. rinse 3X with M9 buffer 

9. filter embryos through 52 |am mesh in 30 ml M9 into a 50 ml tube (if volume of embryos 

< 0.5 ml, embryos are resuspend in 8 ml M9 buffer in a 15 ml tube) 

10. rotate embryos on nutator at 15°C overnight 

30 11. spin down LI larvae and plate on 3-8 large NGM plates seeded with concentrated E. coli 
A typical library may contain 6668 lysates representing 2.1 8 million haploid 
genomes. 
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10 



15 



25 



List of primers for EMS analysis (EMS table) 

primer name 



Genes screened - 
(product size) 

C06E2.N(X)- 
5 (2.1 kb) 



30 



C06E2-1 (round 1 forward) 
C06E2-4 (round 1 reverse) 
C06E2-2 (round 2 forward) 
C06E2-3 (round 2 reverse) 

ZK75.2/75.3(lI)-(3.6 kb) ZK75-31 (round 1 forward) 



ZK75-35 (round 1 reverse) 
ZK75-32 (round 2 forward) 
ZK75-34 (round 2 reverse) 

ZK1251-W1 (round 1 
forward) 

ZK1251-W4 (round 1 
reverse) 

ZK1251-W2(round2 
forward) 

ZK1251-W3 (round 2 
reverse) 

ZK75-31 (round 1 forward) 
ZK75-W4 (round 1 reverse) 

ZK75-32 (round 2 forward) 
ZK84-3B (round 2 reverse) 



ZK1251.N/ZK1251.2 
20 (IV) - (3.5kb) 



ZK75.2/.3/.1/84.N2/84.6 
(II). (12.7 kb) 



primer sequence 

CAAACAGTTGTAGCTCAAAGGC 
(SEQ ID NO: 104) 
GCATACGGTACCTATTCGTTTC 
(SEQ ID NO: 105) 
AGCTCAAAGGCCAAATGTGTG 
( SEQ ID NO: 106) 

AACAAACCCTACAGTTACTGGG 
(SEQ ID NO: 107) 

GCTATCCACCTGTCCAACCTAC 
(SEQ IDNO:108) 
GGAGGCTCTTTACTCGCCTTAC 
(SEQ ID NO: 109) 
TACAGGCTGTCCTTCTGTTACG 
(SEQ 1DNO:110) 
TCCACTATTCCGGTAATACCTC 



(SEQ !D NO: ! ! ! ! 



GTAAG A A ATCGAG AG TC ACGCC 
(SEQ IDNO:l 12) 

GTCTTCACTATCAAACGGGAGG 
(SEQ IDNO:l 13) 

CTGCCTCAAGGAGGAGTTACAC 
(SEQ IDNO:114) 

ATTTATCCCCACGTGAGAGAGG 
(SEQ ID NO: 11 5) 

see above 

CACTGGGATGACAGATTTGATG 
(SEQ IDNO:116) 
see above 

TGATGAGACACGGGTGAAACG 
(SEQ ID NO: 1 1 7) 
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ZK75.1/84.N2/84.6 (II) - ZK75-1 F (round 1 forward) GAACGGATAAAAAGGCGGAGC 
(4.7 kb) (SEQIDNO:I18) 

ZK75-W4 (round 1 reverse) see above 

ZK.75-2A < round 2 forward ) TTG ATGTG ACCTCC AG ATG AAC 

(SEQ ID NO: 119) 
ZK84-3B (round 2 reverse) see above 



M04D8.1/.2/.3 (III)- M04D8-1 (round 1 forward) GCAGCACACTCTTGTTTTCAGC 

1Q (5kb) <SEQIDNO:120> 

M04D8-4 (round 1 reverse) CAAATCACTCACTITCCTGCG 

(SEQ IDNO:121 ) 
M04D8-2 ( round 2 forward ) TTC A AGTGTCCTTGTATCCGTG 

(SEQ 1DN0:122) 
M04D8-3 (round 2 reverse) GCATAGAATGGCGGAAGAT 
15 CAC (SEQ ID NO: 123) 

F13B12.N(IV)-(2.1 kb) FI3B12-I (round I forward) C TTC C A A A TTTG TCC TG A CTG C 

(SEQ ID NO: 124) 
F13B12-4 (roimd I reverse) AATTGCAGGAGTCGAAGTTTCC 

(SEQ ID NO: 125) 
F13B12-2 (round 2 forward) AACGAGCAGACAGGAAATC 

ATC (SEQ ID NO: 126) 
Fl 3B12-3 (round 2 reverse) TGTGACAGCATGTTTGAACGTC 

(SEQ ID NO: 127) 

ZK.75-1 1 (round 1 forward) AGTTGT CAAG A AGTGCGTC AAG 

(SEQ ID NO: 128) 
ZK.75-1B (round 1 reverse) GAGATGGCTTGTTGGACGAC 

(SEQ IDNO:129) 
ZK.75- 1 2 (round 2 forward ) G AC A A A ATC ACGTC ACG AAGT 

(SEQ IDNO:130> 
ZK75-13 (round 2 reverse) TTACTTTTCTGGGCAGCAAGC 
30 (SEQIDNO:131) 



20 



ZK75.1 (II)- (3.7 kb) 



25 
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Results of an example EMS screen 

The following results were obtained in an example EMS screen. 
C06E2.N region: 2.3 million haploid genomes screened 

ZK75.2/.3 region: 1.2 million haploid genomes screened 

ZK1251.2/.N region: 1.2 million haploid genomes screened 

ZK75.1 region: 800,000 haploid genomes screened 

Two confirmed deletions have been obtained in the ZK75.1 region, as 

follows: 

(1) ZK75.1A1 deletes nucleotides 15,182-17,369 of cosmid ZK75.1 
(2 ) ZK75.1A2 deletes nucleotides 15,430-17,879 of cosmid ZK75.1 
ZK75.2/.3/.1/84.N2/84.6 region: 875,000 haploid genomes screened 
ZK75.1/84.N2/84.6 region: 2.1 million haploid genomes screened 

M04D8. 1 /.2/.3 region: 460.000 haploid genomes screened 

F13B12.N region: 1.9 million haploid genomes screened 



15 



EXAMPLE 5: Tel TRANSPOSON INSERTION MUTAGENESIS 

The transposable element Tel may also be used as a mutagen in C elegans since 
iiiseiiion of the transposable element into a gene-of-interest can result in the inactivation of 
gene function. Starting with a strain that contains a high copy number of the Tel 

20 transposable element in a mutator background (i.e., a strain in which the transposable 

element is highly mobile), a Tel library containing approximately 3.000 individual cultures 
is created as previously described (Id.). The library is screened for Tel insertions in the 
region of interest using the polymerase chain reaction with one set of primers specific for 
Tel sequence and one set of gene-specific primers. Because Tel exhibits a preference for 

25 insertion within introns, it is sometimes necessary to carry out a secondary screen of 

populations of insertion animals for imprecise excision of the transposable element, which 
can result in deletion of part or all of the gene of interest (generally, 1-2 kb of genomic 
sequence is deleted). The screen for Tel deletions is performed and deletion animals are 
recovered in the same manner as for the EMS screen described above. 

30 Using such procedures, C. elegans strains have been isolated that contain Tel 

transposon insertions within or neighboring the following insulin-like genes: 
ZK1251 .1/ZK1251.N, C06E2.N, and F13B12.N. Detailed methods are set below. 
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Tel library construction 

A Tel transposon insertion library was constructed according to published protocols 
by Zwaal et aL 1993, Proc. Natl. Acad. Sci. U.S.A. 90:7431-7435; and Plasterk, 1995, 
Reverse Genetics: From Gene Sequence to Mutant Worm, in Caenorhahditis elegans: 
5 Modern Biological Analysis of an Organism (Epstein and Shakes, Eds.) pp. 59-80. 

Size of typical library: 3 sets of 960 cultures 

Analysis of library: By sets of 960 cultures 

Dimensions of set: 1 0 racks of 8 X 1 2 as follows: 

10 Row (8): A-H 

Column (12): 1-12 
Plate (10): pl-plO 

Culturing worms 

POUR 100-mrn NGM (2X peptone) plates-2880 plates total 
15 SEED with E. coli in sterile hood 

CULTURE 5-10 non-synchronized mut-2 (MT3 126) animals per plate-250 plates/day for 
12 days: 

• PREPARE suspension of MT3 126 in M9 buffer in dish 

• TRANSFER 5 [A of suspension onto plates 
20 • COUNT # worms on first few plates 

• INCUBATE @20°C for 1 1-12 days 
ADD 4 ml M9 buffer to plate 

SHAKE plates O/N @18-20°C 

25 Storage of worms 

PREPARE Costar racks (3 racks required per 96 cultures)--90 racks total: 

• MARK racks clearly on front, side, and top 

• MARK individual tubes in each rack 

ALIQUOT each culture into 3 racks (8 X 12 )-240 cultures/day for 12 days: 
30 • ADD few drops of fresh M9 buffer if <1 ml suspension on plate 

• TRANSFER 400 u\ suspension to identical positions on 2 racks (for 
freezing) and remaining suspension to identical position on 3RD RACK (for DNA analysis) 

FREEZE 2 racks for survival: 
35 • ADD 400 fA freezing solution to each tube: 
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30% glycerol (v/v) 
25 mM KP0 4 , pH 6.6 
50 mM NaCl 
2.5 A^g/ml cholesterol 
5 • CLOSE tubes with sterile caps (8 caps on a strip, Costar) 

• COVER rack with lid 

• MIX M9 buffer and freezing solution by inverting rack several times 

• WRAP racks in cotton wool and 2 towels for slow freezing O/N @-80 c C 

• UNWRAP racks and store in separate freezers (a), -80°C 

10 

Lvsatc preparation (3rd rack) 

REMOVE M9 buffer supernatant from sedimented worm suspension 
WASH IX with cold H,0--960 cultures/day for 3 days 
CENTRIFUGE for 3 minutes to pellet worms and ice for 30 sec 
15 REMOVE supernatant 

(FREEZE worm pellets or LYSE directly) 

ADD 200 u \ Cell Lysis Solution (Centra Kit) and 2 jA Proteinase K (10 mg/ml) to each 
pellet 

CLOSE tubes with sterile caps (8 caps on a strip. Costar) 
20 COVER rack with lid 

INCUBATE (a\ 55 °C for 3 hrs - O/N (invert, occasionally) 
STORE @-20or-80°C 

PNA preparation 

25 POOL lysates in 3-D matrix: Pool Rows (individual A - H by plate) 
240 pools total 
8 pools/plate 
1 2 lysates/pool 
pool = 240 Ail 

30 TRANSFER 20 \A of each lysate/row to a pool— 80 pools/day for 3 days 
VORTEX 
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1-D Address: Row 



2-D Address: Plate 



Pool Rows (cumulative A - H) 

24 pools total 

10 mixed lysates/pool 



240 pools total 
8 pools/plate 
12 lysates/pool 



5 120 lysates(total)/pool 



pool = 60 jlx I 



pool = 1.8 ml (1 80 (il of each mixed lysate) 

• TRANSFER 180 /u\ of each mixed lysate/row to a pool 

• PURIFY DNA by Centra kit— 24 DNA preps 

10 # RESUSPEND in TE: 10 mM Tris-HCl 1 mM EDI A, pH 7.6 

• STORE @ -20°C 

88 DNA preps/day for 3 days 

(This stock may be used for many searches: 1OX-50X dilutions used.) 



Library screening 

A library is screened in individual Tiers, each library having three Tiers. Each Tier 
is composed of 1,000 lysates or 200,000 haploid genomes. Lysates are pooled according to 
above references. First dimension screen involves PCR on 8 samples of pooled DNA from 

20 10 96-well plates. Second dimension screen determines on which of the 10 96- well plates 
the mutant resides (involves screening of 10 DNA pools). Third dimension screen 
determines the "address" of a particular mutant (i.e., in which column and row a particular 
mutant resides - via screening of 12 individual lysates from a single row). First dimension 
reactions are done in quadruplicate; second and third are done in triplicate. 

25 Two rounds of PCR are performed; PCR is performed with a pair of gene-specific 

primers and a pair of Tel -specific primers. Two different pairs of Tel primers are used: one 
pair points outward from the left of the transposon, and the other pair points outward from 
the right (these primer pairs are described in the references cited above). 

The first and second round PCR for each dimension is performed in 15 /il using the 

30 following in each reaction: 



IX PCR buffer provided by the manufacturer (Perkin Elmer) 
1.5 mM MgCf 
0.2 mM dNTPs 



35 



0.5 pM of the Tel and the gene-specific primer 
0.5 units of Perkin Elmer Taq Polymerase 
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H 2 0 to 13 ^1 for the first round reactions, and to 15 ^1 for the second round 
First and Second dimension: 2 iA of 1 :20 DNA is added; 1:10 DNA is added to the 
third dimension reactions. A small amount of first round reaction is transferred to the 
second round using a pin replicator. PGR cycling conditions are: 94 for 3 minutes; then 94 
5 for 40 seconds, 58 for 1 minute, 72 for 2 minutes for 35 cycles; then 72 for 2 minutes. 



10 



15 



20 



25 



30 



35 
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Genes screened 
All 



ZK75.2/.3/.1/84.N2/84.6 



20 



ZK1251.N/ZKI251.2 (IV) 



25 



30 



Oligo name 

*Tcl LI (round ] left) 

*Tcl L2 (round 2 left) 

*Tcl Rl (round 1 
right) 

*Tcl R2 (round 2 
right) 

ZK75-31 (round 1 ) 
ZK75-32 (round 2) 
ZK75-35 (round 1 ) 
ZK75-34 (round 2) 
ZK75- IF (round 1) 
ZK75-2A(round2) 
ZK75-W4 (round 1) 



ZK75-M4 (round ]) 
ZK75-M3 (round 2) 

ZK1251-W4 (round 1 ) 
ZK1251-W3 (round 2) 
ZK 125 1-24 (round 1) 

ZK 125 1-23 (round 2) 

ZK1251-N1 (round 1 ) 

ZK1251-N2 (round 2.) 



Oligo sequence 

CGTGGGTATTCCTTGTTCGAAG 
CCAGCTAC (SEQIDNO:132) 
TCAAGTCAAATGGATGCTTGAGA 
(SEOIDNO:133) 

TCACAAGCTGATCGACTCGATG 
CCACGTCG (SEQ !DNO:134) 
GATTTTGTGAACACTGTGGTGAAGT 
(SEQ IDNO:135) 

SEE EMS TABLE 

SEE EMS TABLE 

SEE EMS TABLE 

SEE EMS TABLE 

SEE EMS TABLE 

SEE EMS TABLE 

SEE EMS TABLE 

SEE EMS TABLE 

TTATTACATCCGTCACTGCGTC 

(SEQ 1DN0:136) 

GCGTCCTTATTCAGAATTCCAG 
(SEQ ID NO: 137) 

SEE EMS TABLE 
SEE EMS TABLE 
CTTGTGACTTCAAGCCCACTTC 
(SEQ ID NO: 138) 

GGTTATGAACCGATTAGGCTCC 
(SEQ IDNO:139) 

GTAGCCTTCCGGGGT TAAAATC 
(SEQ IDNO:140) 
GATCTCGCGCTATGTTTTGAG 
(SEQ ID NO: 141) 
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5 

F13B12.N (IV) 

10 

M04D8.1/.2/.3 (III) 

15 
20 
25 
30 
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C06E2-lA(round 1) GACAGCTGAAGCTGACCAAAC 

(SEQ ID NO: 142) 
C06E2-2A (round 2) CAGGAGTTAAACGTGGTCACTG 

(SEQ ID NO: 143) 
C06E2-4 (round 1 ) SEE EMS TABLE 



F13BI2-1 (round I t 
F13B12-2(round2) 
FI3B12-4 (round I) 
F13B12-3 (round 2 ) 



SEE EMS TABLE 
SEE EMS TABLE 
SEE EMS TABLE 
SEE EMS TABLE 



M04D8-I (round 1) 
M04D8-4 (round 1 ) 
M04D8-2 ( round 2) 
M04D8-3 (round 2 1 



SEE EMS TABLE 
SEE EMS TABLE 
SEE EMS TABLE 
SEE EMS TABLE 
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Results of tcl screen 

Five confirmed Tcl insertions have been found in or near the following C. elegans 
insulin-like genes: one insertion near ZK1251.2/.N; two insertions near C06E2.N; and two 
5 insertions in F13B12.N. 



EXAMPLE 6: DOUBLE-STRANDED RNA INTERFERENCE ANALYSIS 

The function of the C. elegans insulin-like genes identified herein may be 
characterized and/or determined using a method based on the interfering properties of 

10 double-stranded RNAs derived from the coding regions of the identified genes (see Fire et 
aL, 1998, Potent and specific genetic interference by double-stranded RNA in 
Caenorhabditis elegans. Nature 391 :806-81 1). In this method, sense and antisense RNAs 
derived from a substantial portion of a C. elegans insulin-like gene are synthesized in vitro 
from phagemid DNA templates containing cDNA clones of insulin-like genes which are 

15 inserted between opposing promoters for T3 and T7 phage RNA polymerases, or from PCR 
products amplified from coding regions of insulin-like genes, where the primers used for the 
PCR reactions are modified by the addition of phage T3 and T7 promoters. The resulting 
sense and antisense RNAs are annealed in an injection buffer and the double-stranded RNA 
injected into C. elegans hermaphrodites. Progeny of the injected hermaphrodites are 

20 inspected for phenotypes-of-interest. Other methods can also been employed for generating 
mutant phenotypes in nematodes using single-stranded antisense DNA or RNA species, as 
described above. However, single-stranded methods may be less effective in nematodes 
than that of double-stranded RNA interference (see Guo and Kemphues, 1995, par-/, a gene 
required for establishing polarity in C. elegans embryos, encodes a putative Ser/Thr kinase 

25 that is asymmetrically distributed, Cell 81:61 1-620; see also Fire, 1991, Production of 
antisense RNA leads to effective and specific inhibition of gene expression in C. elegans 
muscle, Development 1 13:503-514). 

EXAMPLE 7: MIS-EXPRESSION ANALYSIS 

30 Mis-expression (i.e., ectopic expression, abnormal expression) of wild-type and/or 

mutant C. elegans insulin-likegenes so as to create transgenic animals is another useful 
method for the analysis of gene function in nematodes (Mcllo and Fire. 1995, DNA 
transformation, Methods in Cell Biology 48:45 1-482). Such transgenic animals may be 
created to contain gene fusions of the coding regions of insulin-like genes joined (i.e., 

35 operably linked) to a specific promoter whose regulation has been well characterized. Such 
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a specific promoter may be used as a heterologous promoter (i.e., a promoter which is not 
naturally linked to the gene). Examples of promoters that can be used to drive such mis- 
expression of insulin-like genes include but are not limited to: the heat shock gene 
promoters hsp 16-2 and hsp 16-41 , useful for temperature-induced expression; the myo-2 

5 gene promoter, useful for pharyngeal muscle-specific expression; the hlh-1 gene promoter, 
useful for body-muscle-specific expression; and the mec-3 gene promoter, useful for touch- 
neuron-specific gene expression. Gene fusions for directing the mis-expression of insulin- 
like genes are incorporated into a transformation vector which is injected into nematodes 
along with a plasmid containing a dominant selectable marker, such as rol-6. Transgenic 

1 0 animals are identified as those exhibiting a roller phenotype, and the transgenic animals are 
inspected for additional phenotypes of interest created by mis-expression of the insulin-like 
gene. 

EXAMPLE 8: ANALYSIS OF MUTANT PHENOTYPES 

15 After isolation of nematodes carrying mutated or mis-expressed insulin-like genes, 

or inhibitory RNAs, animals are carefully examined for phenotypes-of-interest. For the 
situations involving deletions or Tel insertions in insulin-like genes, nematodes are 
generated that are homozygous and heterozygous for the mutant insulin-like genes. 

Examples of specific phenotypes that may be investigated include but are not limited 

20 to: lethality, sterility, reduction in brood size, egg-laying defects, dauer constitutive, dauer 
defective, increased life span, decreased life span, defective locomotion, defective 
chernotaxis, defective thermotaxis, abnormal body shape, abnormal body size, and 
alterations in the morphogenesis of specific organs, such as the vulva, nervous system, gut, 
or musculature (see Hodgkin, 1997, Appendix I: Genetics, pp. 882-1047, in C. elegans II, 

25 Riddle et al., Eds., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York). 

EXAMPLE 9: ANALYSIS OF GENETIC INTERACTIONS AND MULTIPLE 
MUTANTS 

Another approach that may be used to probe the biological function of the insulin- 
30 like genes identified herein is by using tests for genetic interactions with other genes that 
may participate in the same, related, interacting, or modifying genetic or biochemical 
pathways. In particular, since it is evident that there are closely-linked clusters of insulin- 
like genes in the C. elegans genome, this raises the possibility of functional redundancy of 
one or more genes. Consequently, it is of interest to investigate the phenotypes of 
35 nematodes containing mutations (such as deletions or Tel insertions as described above) 
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that knock-out the function of more than one insulin-like gene. Such strains carrying 
mutations in multiple genes can be generated by cross breeding animals carrying the 
individual mutations, followed by selection of progeny that carry the desired multiple 
mutations. Alternatively, multiple insulin-like genes can be inactivated by the simultaneous 

5 injection of double-stranded RNAs derived from each gene using the method of double- 
stranded RNA interference described above. 

One specific question-of-interest is genetic analysis of interactions of insulin-like 
genes with other well-characterized C. elegans genes and pathways. Thus, double mutant 
nematodes may be constructed that carry mutations in an insulin-like gene and another gene- 

10 of-interest. It is of particular interest to test the interaction of the insulin-like genes with 
other genes involved in the dauer formation and life span pathway, especially those that 
exhibit homology to insulin signaling components in vertebrates. For example, nematodes 
carrying mutations in insulin-like genes and either a loss-of- function mutation of daf-16. a 
hypomorphic allele of daf-2, a hypomorphic allele of age-L would be of use in investigating 

15 the involvement of different insulin-like genes in the dauer formation and life span 

pathways. Also, transgenic animals mis-expressing insulin-like genes which further carry 
mutations in daf-2 are of interest, e.g., for examining genetic interactions between the 
insulin-like genes and the dauei formation and life span pathways. Other genetic 
interactions may be tested based on the phenotypes observed for alterations of the insulin- 

20 like genes alone. For example, if alteration of insulin-like genes produces an abnormal 
body size, mutations in these insulin-like genes could be tested for interactions with other 
genes that also affect body size, such as daf-4. sma-2 and sma-3. 

EXAMPLE 10: GENETIC MODIFIER SCREENS 

25 The initial characterization of phenotypes created by mutations in single or multiple 

insulin-like genes is expected to lead to the identification of nematode strains that exhibit 
phenotypes appropriate for large-scale genetic modifier screens aimed at discovering other 
components of the same pathway. For example, it is of particular interest to identify those 
insulin-like genes that encode ligands of the daf-2 receptor. Potential daf-2 ligands 

30 (agonists) might be revealed by the genetic interaction analysis described above as those 
insulin-like genes which, when mutated alone or in combination, exhibit the following 
properties: (a) a dauer constitutive phenotype similar to that observed in daf-2 mutant 
animals; and (b) suppression of the dauer constitutive phenotype when insulin-like gene 
mutations are tested in combination with mutations in the daf-16 gene (an antagonist of the 

35 pathway). There are, however, many other phenotypes that could be suitable starting points 
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for large-scale genetic modifier screens, including a defective egg-laying phenotype, an 
abnormal lipid accumulation phenotype (e.g., as revealed by staining with lipid-specific 
dyes), and decreased or increased life span phenotypes. 

The procedures involved in a typical genetic modifier screen are described below 

5 (see also Huang and Sternberg, 1995, Genetic discussion of developmental pathways. 
Methods in Cell Biology 48:97-122). In general, hermaphrodites carrying mutations in 
insulin-like genes are exposed to a mutagen, such as EMS or trimethylpsoralen with 
ultraviolet radiation. The descendants of such animals are then screened for the rare 
individuals that display suppressed or enhanced versions of the original phenotype, and any 

1 0 new mutations detected are presumed to alter other genes that participate in the same 
phenotype-generating pathway. In a pilot-scale genetic screen, 10,000 or fewer 
mutagenized nematodes would be inspected; in a moderate-scale genetic screen, about 
30,000 to 100,000 mutagenized animals would be inspected; and in a large-scale genetic 
screen, more than 100,000 mutagenized animals would be inspected. 

1 5 Next, nematodes identified with suppressor or enhancer mutations are isolated, and 

populations of descendants of these animals are expanded. The newly-identified "modifier" 
genes that are altered by these suppressor or enhancer mutations are mapped using a 
combination of genetic and molecular methods. Such newly-identified modifier mutations 
may also be isolated away from the mutations in the insulin-like genes by genetic crosses; 

20 the intrinsic phenotypes caused by the modifier mutations themselves may thus be assessed 
in isolation. 

Also, such newly-identified modifier mutations may be tested for genetic 
interactions with other genes-of-interest using methods described above. In particular, 
modifier genes may be placed into so-called complementation groups, using genetic crosses, 

25 for subsequent examination of the phenotypes of progeny that contain two or more modifier 
mutations. Two modifier mutations are said to fall within the same complementation group 
if nematodes carrying both mutations exhibit essentially the same phenotype as nematodes 
carrying each mutation alone. Generally, individual complementation groups defined in this 
way correspond to individual genes. The precise location and sequence of the modifier gene 

30 in the genomic DNA is confirmed by: (a) identifying sequence changes specific to the 
modifier mutations within the gene in question; and (b) in most cases, demonstrating 
reversion of the phenotype caused by the modifier mutation upon injection of a limited 
DNA fragment containing the wild-type form of the modifier gene. 

An alternative mutagenesis-and-screening strategy that is especially useful for the 

35 rapid identification of modifier genes has also been described (see Anderson, 1995, 
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Mutagenesis, Methods in Cell Biology 4:31-58) which is based on the use of transposable 
elements as mutagens. Because the mutated modifier gene becomes tagged with sequences 
derived from the transposable element, such as Tel as described above, this strategy allows 
for easy identification of the modifier gene through PCR amplification of sequences 
5 adjacent to the insertion site of the transposon. Mutagenesis may be carried out by 

introduction of a mutator locus, termed mut-2, which promotes mobility of transposons. In 
this case, the mutator locus is introduced into strains carrying mutations in insulin-like 
genes, and the progeny examined for suppression or enhancement of the original phenotype, 
as described above. 

10 Once nematode modifier genes that participate in the same pathway as insulin-like 

genes have been identified using genetic screens, homologous genes in other species-of- 
interest can be isolated using procedures based on cross-hybridization with C. elegans 
modifier gene DNA probes, PCR-based strategies with primer sequences derived from those 
of C elegans modifier genes, and/or computer searches of sequence databases. For 

15 therapeutic applications related to the function of insulin superfamily hormones, human and 
rodent homologs of the nematode modifier genes are of particular interest. For pesticide 
applications, homologs of nematode modifier genes in agriculturally-important pest species, 
beneficial insects, and other inveriebiate model organisms are of particular interest and 
include the following: D. melanogaster. Anopheles, Heliothis virescens, Plodia 

20 interpunctella, Spodoptera frugiperda, Pectinophora gosypiella, Plutella xylostella, 

Tribolium castaneum, Diabrotica spp., Leptinotarsa decemlineata, Anthonomus grandis, 
Bemisia (abaci, Myzus persicae, Blattella germanica, Apis mellifera, Clenocephalites felis, 
Amblyoma americanum, Meloidogyne spp., Heterodera glycinii, etc. 



30 
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WHAT IS CLAIMED IS : 

1 . A method of analyzing an effect of expression or mis-expression of a C 
elegcms insulin-like gene comprising observing a first nematode genetically engineered to 

5 express or mis-express a C elegans insulin-like protein of any one of groups I, II or IV, or a 
derivative or fragment thereof that displays one or more functional activities of the C 
elegans insulin-like protein. 

2. The method of Claim 1, wherein the protein, derivative or fragment 

10 comprises an amino acid sequence selected from the group consisting of SEQ IDNOs.1-15, 
18, 158-161 and 198-206. 

3. The method of Claim 1 , wherein the protein, derivative or fragment 
comprises an amino acid sequence selected from the group consisting of SEQ ID NOs.T, 6. 

15 8,9, 1 1, 12, 15, 18, 158-161 and 198-206. 

4 The method of Claim 1, wherein the protein, derivative or fragment is 
encoded by a nucleotide sequence selected furm the group consisting of SEQ ID NOs:19-33, 
36, 162-165 and 207-215. 

20 

5. The method of Claim 1, wherein the protein, derivative or fragment is 
encoded by a nucleotide sequence selected from the group consisting of SEQ ID NOs:19, 
24, 26,27, 29, 30, 33, 36, 162-165 and 207-215. 

25 6. The method of any of Claims 1 -5, wherein the effect is observed in an 

assay selected from the group consisting of a dauer formation assay, a developmental assay,, 
an energy metabolism assay, a growth rate assay and a reproductive capacity assay. 

7. The method of any of Claims 1-5, wherein the C elegans insulin-like 
30 protein, derivative or fragment is encoded by a mutated or abnormally expressed gene and 

the effect observed is the phenotype associated with the mutation or abnormal expression. 

8. The method of any of Claims 1-5, wherein the gene encoding the C. 
elegans insulin-like protein, derivative or fragment is caused to be mutated or abnormally 

35 expressed. 
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9. The method of Claim 8, wherein the gene is mutated or abnormally 
expressed using a technique selected from the group consisting of EMS chemical deletion 
mutagenesis, transposon insertion mutagenesis and double-stranded RNA interference. 

5 1 0. The method of Claim 9, further comprising observing a second 

nematode having the same mutation or abnormal expression in the gene encoding the C. 
elegans insulin-like protein as the first nematode observed, wherein the second nematode 
additionally comprises a second mutation in a gene-of-interest, and wherein the effect 
observed is a difference, if any, between the phenotype of the first nematode and the second 

10 nematode, wherein a difference in phenotype identifies the gene-of-interest as capable of 
modifying the function of the gene encoding the C. elegans insulin-like protein. 



1 1 . The method of Claim 1 0, wherein the phenotype observed is selected 
from the group consisting of an altered body shape phenotype, an altered body size 

15 phenotype, an altered chemotaxis phenotype, an altered brood size phenotype, an altered 
egg-laying phenotype, an altered life span phenotype, an altered lipid accumulation 
phenotype, an altered locomotion phenotype, an altered organ morphogenesis phenotype, an 
altered ihermotaxis phenotype, a daucr constitutive phenotype, a dauer defective phenotype, 
a lethal phenotype and a sterile phenotype. 

20 

12. The method of Claim 1 1, wherein the phenotype observed is altered 
organ morphogenesis, and wherein the organ is selected from the group consisting of vulva, 
nervous system, gut and musculature. 

25 13. The method of Claim 12, wherein the phenotype observed is altered 

body size, and wherein the nematode is assayed for activity of a gene affecting body size 
selected from the group consisting of daf-4, sma-2 and sma-3. 

14. The method of Claim 10, wherein the gene-of-interest is a homolog of an 
30 insulin signaling pathway gene from vertebrates. 

15. The method of Claim 10, wherein the gene-of-interest is selected from 
the group consisting of da/2, daf-16 and age- 1. 



35 
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16. The method of Claim 15, wherein the gene-of-interest is dqf-2 and the 
phenotype observed is selected from the group consisting of dauer formation and life span. 

17. A purified C. elegans insulin-like protein comprising or consisting of an 
5 amino acid sequence of any one of SEQ ID NOs: 1,6, 8, 9, 11, 12, 15, 18, 158-161 or 198- 

206. 

1 8. A purified derivative or fragment of the protein of Claim 1 7 consisting 
of at least 10 contiguous amino acids of the C. elegans insulin-like protein. 

10 19. The derivative or fragment of Claim 1 8 which displays one or more 

functional activities of the C. elegans insulin-like protein. 

20. The derivative or fragment of Claim 18 which is capable of 
immunospecific binding to an antibody raised against a C. elegans insulin-like protein. 

15 

21. A purified molecule comprising the derivative or fragment of any one of 

Claims 18-20. 

22. A chimeric protein comprising a fragment of the C elegans insulin-like 
20 protein of Claim 17 consisting of at least 10 contiguous amino acids of the C elegam 

insulin-like protein fused by a covalent bond to an amino acid sequence of a second protein, 
which second protein is not a C. elegans insulin-like protein. 

23. A purified antibody or an antigen-binding fragment or derivati ve thereof 
25 capable of immunospecific binding to the protein, derivative or fragment of any one of 

Claims 17-20 and not to an insulin-like protein of another species. 

24. A composition comprising the protein, derivative or fragment of any one 
of Claims 17-20 and a pharmaceutical^ acceptable carrier. 

30 

25. The protein of Claim 17, wherein the protein further comprises a domain 
depicted in any of FIGs. 4-34, wherein the domain is selected from the group consisting of a 
signal peptide, a pro peptide, an A domain, a B domain, and a C domain. 

35 
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26. The protein of Claim 1 7, wherein the protein further comprises a B 
peptide domain linked by one or more disulfide bonds to an A peptide domain. 

27. The protein of Claim 26, wherein said B and A peptide domains have 
5 not been proteolytically cleaved into separate chains. 

28. A mature C. elegans insulin-like protein which is the result of 
expressing a nucleic acid encoding the protein of Claim 17. 

10 29. The protein of Claim 17, wherein the C. elegans insuiin-like protein is a 

Class IV protein. 

30. An isolated nucleic acid or a complement thereof which comprises a 
heterologous nucleotide sequence of less than 15,000 nucleotides that encodes at least 10 

1 5 contiguous amino acids of a C. elegans insulin-like protein of Claim 1 7, provided that the 
isolated nucleic acid is not a cosmid. 

31. The isolated nucleic acid of Claim 10. which comprises a nucleotide 
sequence of any one of SEQ ID NOs:19, 24, 26, 2 7, 29, 30, 33, 36, 162- J 65 and 207-215. or 

20 which encodes a C elegans insulin-like protein comprising any one of SEQ ID NOs: 1, 6 r 8, 
9, 11, 12, 15, 18, 158-161 and 198-206. 

32 The isolated nucleic acid of Claim 30, which encodes one or more 
domains as annotated and defined by an amino acid sequence depicted in any of FIGs. 4-34, 
25 wherein the domain is selected from the group consisting :>f a signal peptide, a pro peptide, 
an A domain, a B domain, and a C domain. 

33. The isolated nucleic acid of Claim 30, further comprising a nucleotide 
sequence encoding a functional derivative of at least a portion of an amino acid sequence 

30 selected from the group consisting of any one of SEQ ID NOs: 1-1 5, 18, 158-161 and 198- 
206. 

34. A non-human animal comprising a transgene which encodes a C. 
elegans insulin-like protein, derivative or fragment of Claim 18. 

35 
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35. The non-human animal of Claim 34 which is a C. elegans animal and 
further comprises at least one deleted or inactivated C elegans insulin-like gene encoding 
an amino acid sequence selected from the group consisting of SEQ ID NOs:l-18, 158-161 
and 198-206. 

5 

36. The method of any of Claims 1-5, wherein the expression of the C 
elegans insulin-like protein is driven by a heterologous promoter. 

37. The method of Claim 36, wherein the heterologous promoter is selected 
10 from the group consisting of an hsp 16-2 promoter, an hsp 16-41 promoter, a myo-2 

promoter, an hlh-l promoter and a mec-3 promoter. 

38. The method of Claim 37, additionally comprising contacting the 
nematode with one or more molecules and determining whether the one or more molecules 

15 alters the expression of the C elegans insulin-like protein. 

39. An isolated nucleic acid or a complement thereof which comprises a 
iitueiologous nucleotide sequence of less than 500 nucleotides that encodes at ieast 10 
contiguous amino acids of a C. elegans insulin-like protein of Claim 1 /. 

20 

40. A purified C elegans insulin-like protein of any one of groups I, I) or 
IV f or a derivative or fragment thereof that displays one or more functional activities of the 
C. elegans insulin-like protein, for use in insulin-related research. 
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F13B12.N 

10 20 30 40 50 60 

ATGTACTGGTTTCGTCAAGTTTACAGACCCTCGTTCTTCTTTGGCTTTCTCGCGATCCTT 
TACATGACCAAAGCAGTTCAAATGTCTGGGAGCAAGAAGAAACCGAAAGAGCGCTAGGAA 
MYWFRQVYRPSFFFGFLAI L> 

SIGNAL PEPTIDE > 

MYWFRQVYRPSFFFGFLAI L> 

CODING REGION > 

70 80 90 100 110 120 

CTCCTCTCGTCGCCGACGCCTTCAGACGCATCGATTCGACTATGTGGATCACGTCTCACA 
GAGGAGAGCAGCGGCTGCGGAAGTCTGCGTAGCTAAGCTGATACACCTAGTGCAGAGTGT 

S I R L C G S R L T> 

B PEPTIDE > 

LLSSPTPSDA> 
SIGNAL PEPTIDE > 

LLSSPTPSDASI RLCGSRLT> 
CODING REGION > 

130 140 150 160 170 180 
ACAACCCTTTTAGCAGTATGCCGGAATCAGCTGTGCACTGGATTAACCGCTTTCAAACGT 
TGTTGGGAAAATCGTCATACGGCCTTAGTCGACACGTGACCTAATTGGCGAAAGTTTGCA 

K R> 
> 

TTLLAVCRNQLCTGL TAF> 

B PEPTIDE > 

TTLLAVCRNOLCTGLTAFKR> 
CODING REGION > 



FIG. 4A 
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190 200 210 220 230 240 
TCCGCCGACCAATCCTATGCACCAACAACTCGCGATCTTTTTCACATTCACCACCAACAA 
AGGCGGCTGGTTAGGATACGTGGTTGTTGAGCGCTAGAAAAAGTGTAAGTGGTGGTTGTT 
S A D Q S Y A P T T R D L F H I H H Q Q> 

C PEPTIDE > 

SADQSYAPTTRDLFHI HHQQ> 
COOING REGION > 

250 260 270 280 290 300 
AAGCGAGGCGGAATTGCGACAGAATGTTGTGAGAAGCGATGTTCATTTGCATATCTCAAA 
TTCGCTCCGCCTTAACGCTGTCTTACAACACTCTTCGCTAC.AAGTAAACGTATAGAGTTT 
GGIATECCEKRCSFAYLK> 

A PEPTIDE > 

K R> 
> 

KRGG I ATECCEKRCSFAYLK> 
CODING REGION > 

310 320 330 
ACAT TCTGCTGCAATCAGGACGATAATTGA 
TGTAAGACGACGTTACTCCTGCTATTAACT 

TFCCNQDDN*> 

A PEPTIDE > 

TFCCNQDDN*> 

CODING REGION > 



FIG.4B 



WO 99/54436 



POYUS99/08522 
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ZK75.1 

10 20 30 40 50 60 

ATGTTTTCATTCTTTACATATTTCCTTCTCTCCGCACTTCTTCTCTCCGCTTCATGTCGA 
TACAAAAGTAAGAAATGTATAAAGGAAGAGAGGCGTGAAGAAGAGAGGCGAAGTACAGCT 

R> 
> 

MFSFFTYFLLSALLLSASO 

SIGNAL PEPTIDE > 

MFSFF TYFLLSALLLSASCR> 
CODING REGION > 

70 80 90 100 110 120 

CAACCTTCCATGGACACCAGCAAAGCCGATCGTATTCTACGAGAGATCGAAATGGAAACA 
GTTGGAAGGTACCTGTGGTCGTTTCGGCTAGCATAAGATGCTCTCTAGCTTTACCTTTGT 

QPSMDTSKADRILRE I EMET> 
PRO PEPTIDE > 

QPSMDTSKADRI LRE I EMET> 
CODING REGION > 

130 140 150 160 170 180 

GAACTCGAAAATCAACTCTCCCGAGCACGACGAGTCCCAGCTGGAGAGGTTCGTGCCTGT 

CT TG AGC T TT TAG T TG AGAGGGCTCG TGCTGCTCAGGG TCG ACCTCTCCAAGCACGG ACA 

ELENQLSRARR> 

PRO PEPTIDE > 

VPAGEVRAO 

B DOMAIN > 

ELENQLSRARRVPAGEVRAO 
CODING REGION > 



FIG.5A 



WO 99/54436 



PCT/US99/08522 



8/51 



190 200 210 220 230 240 

GGAAGACGACTTCTTCTCTTTGTCTGGTCAACCTGTGGAGAACCATGCACGCCACAAGAG 
CCTTCTGCTGAAGAAGAGAAACAGACCAGTTGGACACCTCTTGGTACGTGCGGTGTTCTC 

E> 
> 

GRRLLLFVWSTCGEPC TPQ> 

B DOMAIN > 

GRRLLLFVWSTCGEPCTPQE> 
CODING REGION > 

250 260 270 280 290 300 

GACATGGACATTGCCACAGTTTGCTGCACAACACAGTGCACTCCATCATATATAAAACAA 
CTGTACCTGTAACGGTGTCAAACGACGTGTTGTGTCACGTGAGGTAGTATATATTTTGTT 
DMDIATVCCTTQCTPSYIKO 

A DOMAIN > 

DMDIATVCCTTQCTPSYIKQ> 
CODING REGION > 



310 320 
GCTTGCTGCCCAGAAAAGTAA 
CGAACGACGGGGTCTTTCATT 
A C C P E K *> 

A DOMAIN > 

A C C P E K *> 
—CODING REGION > 



FIG.5B 
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ZK75.2 

10 20 30 40 50 60 

ATGAACGCTATAATCTTCIGTCTCCTCTTCACAACTGTCACTGCCACTTATGAAGTTTTC 
TACTTGCGATATTAGAAGACAGAGGAGAAGTGTTGACAGTGACGGTGAATACTTCAAAAG 

T Y E V F> 

_PR0 PEPT > 

MNAI IFCLLFTTVTA> 

SIGNAL PEPTIDE > 

MNAI 1FCLLFTTVTATYEVF> 
COOING REGION > 

70 80 90 100 110 120 

GGAAAAGGAATAGAACACAGAAATGAACATTTGATCATCAATCAACTTGATATCATACCA 
CCTTTTCCTTATCTTGTGTCT7TACTTGTAAACTAGTAGTTAGTTGAACTATAGTATGGT 

GKGIEHRNEHLIINQLDIIP> 
PRO PEPTIDE > 

GKGIEHRNEHLIINQLDIIP> 
CODING REGION > 



130 140 150 160 170 180 
GTTGAGTCAACTCCAACTCCAAACCGTGCCTCAAGAGTCCAGAAACGTCTATGCGGAAGA 
CAACTCAGTTGAGGTTGAGGTTTGGCACGGAGTTCTCAGGTCTTTGCAGATACGCCTTCT 
VESTPTPNRASR> 
PRO PEPTIDE > 

V Q K R L C G R> 

B DOMAIN > 

VESTPTPNRASRVQKRLCGR> 
CODING REGION > 



FIG.6A 



WO 99/54436 



10/51 



PCT7US99/08522 



190 200 210 220 230 240 
CGTCTTATTTTATTCATGCTTGCAACATGTGGAGAATGTGATACAGATTCATCAGAAGAC 
GCAGAATAAAATAAGTACGAACGTTGTACACCTCTTACACTATGTCTAAGTAGTCTTCTG 

S S E D> 

> 

R L I L F M L A T C G E C D T D> 

B DOMAIN > 

RL ILFMLATCGECDTDSSED> 
COOING REGION > 



250 260 270 230 290 300 

CTTTCGCATATTTGCTGCATAAAACAATGTGACGTTCAAGATATCATCAGAGTCTGCTGC 
GAAAGCGTATAAACGACGTATTTTGTTACACTGCAAGTTCTATAGTAGTCTCAGACGACG 
LSHICCIKQCDVQDI IRVCO 

A DOMAIN > 

LSHICCIKQCDVQDI IRVCO 
CODING REGION > 



310 320 
CCGAATTCATTTAGAAAATAG 
GGCTTAAGTAAATCTTTTATC 
P N S F R K *> 

A 00MAIN > 

P N S F R K *> 
-COOING REGION > 



FIG.6B 



WO 99/54436 
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ZK75.3 

10 20 30 40 50 60 

ATGAAACTCTCCGTTGTTCTTGCACTTTTCATTATTTTCCAACTTGGAGCTGCAAGTCTT 
TACTTTGAGAGGCAACAAGAACGTGAAAAGTAATAAAAGGTTGAACCTCGACGTTCAGAA 

A S L> 



> 

MKLSVVLALFIIFQLGA> 
SIGNAL PEPTIDE > 



MKLSVVLALFI IFQLGAASL> 
CODING REGION > 



70 80 90 100 110 120 

ATGCGTAACTGGATGITCGATTTTGAGAAAGAATTGGAACACGATTATGATGATTCGGAA 
TACGCATTGACCTACAAGCTAAAACTCTTTCTTAACCTTGTGCTAATACTACTAAGCCTT 

MRNWMFDFEKELEHDYDDSE> 
PRO PEPTIDE > 

MRNWMFDFEKELEHDYDDSE> 
CODING REGION > 

130 140 150 160 170 180 
ATTGGATTCCATAACATTCACTCCCTGATGGCCAGATCAAGAAGAGGAGACAAAGTGAAG 
TAACCTAAGGTATTGTAAGTGAGGGACTACCGGTCTAGTTCTTCTCCTCTGTTTCACTTC 

G D K V K> 

B DOMAIN > 

IGFHN IHSLMARSRR> 

PRO PEPTIDE > 

IGFHN IHSLMARSRRGDKVK> 
CODING REGION > 



FIG.7A 



WO 99/54436 



12/51 



PCT/US99/08522 



190 200 210 220 230 240 
ATTTGTGGTACAAAAGTTCTGAAAATGGTGATGGTAATGTGTGGAGGAGAATGTTCATCA 
TAAACACCATGTTTTCAAGACTTTTACCACTACCATTACACACCTCCTCTTACAAGTAGT 

ICGTKVLKMVMVMCGGECSS> 
B DOMAIN > 

ICGTKVLKMVMVMCGGECSS> 
COOING REGION > 

250 260 270 280 290 300 
ACGAATGAGAACATCGCTACAGAATGCTGTGAAAAAATGTGCACAATGGAAGATATAACT 
TGCTTACTCTTGTAGCGATGTCTTACGACACTTTTTTACACGTGTTACCTTCTATATTGA 

TNENIATECCEKMCTMED I T> 
A DOMAIN > 

TNENIATECCEKMCTMED I T> 
CODING REGION > 

310 320 
ACTAAGTGCTGCCCTTCAAGATGA 
TGATTCACGACGGGAAGTTCTACT 
T K C C P S R *> 

A DOMAIN > 

T K C C P S R *> 
CODING REGION > 



FIG.7B 



WO 99/54436 
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Zk84.6 

10 20 30 40 50 60 

ATGAACTCTGTCTTTACTATCATCTTCGTTTTGTGCGCACTCCAAGTCGCTGCAAGTTTC 

T ACTTGAGACAGAAATGATAG TAGAAGCAAAACACGCG TGAGG T TCAGCGACG T TCAAAG 

F> 
> 

MNSVFTI IFVLCALQVAAS> 
SIGNAL PEPTIDE > 

MNSVFTI I FVLCALQVAASF> 
CODING REGION > 

70 80 90 iOO iiO 120 
CGTCAATCCTTCGGTCCTTCAATGTCTGAAGAATCAGCAAGCATGCAACTTCTCCGTGAA 
GCAGTTAGGAAGCCAGGAAGTTACAGACTTCTTAGTCGTTCGTACGTTGAAGAGGCACTT 

RQSFGPSMSEESASMQLLRE> 
PRO PEPTIDE > 

ROSFGPSMSEESASMQLLRE> 
CODING REGION > 

130 140 150 160 170 180 

CTTCAACACAACATGATGGAATCAGCTCACCGACCAATGCCACGAGCAAGACGTGTTCCA 

GAAGTTGTGTTGTACTACCTTAGTCGAGTGGCTGGTTACGGTGCTCGTTCTGCACAAGGT 

V P> 
> 

LQHNMMESAHRPMPRARR> 
PRO PEPTIDE > 

LQHNMMESAHRPMPRARRVP> 
CODING REGION > 



FIG.8A 



WO 99/54436 



PCT/US99/08522 
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190 200 210 220 230 240 
GCACCAGGAGAAACTCGTGCCTGCGGAAGAAAACTCATCTCTTTAGTCATGGCTGTCTGT 
CGTGG TCCTCT T TGAGCACGGACGCCT TCTT TTGAGTAGAGAAATCAG TACCGACAGACA 

APGETRACGRKL 1SLVMAVO 
B DOMAIN > 

APGETRACGRKL ISLVMAVO 
COOING REGION > 

250 260 270 280 290 300 
GGAGATCTTTGCAACCCACAAGAAGGAAAGGACATTGCGACTGAATGCTGCGGAAATCAG 
CCTCTAGAAACGTTGGGTGTTCTTCCTTTCCTGTAACGCTGACTTACGACGCCTTTAGTG 

EGKD I ATECCGNQ> 

A DOMAIN > 

G D L C N P Q> 
B DOMAIN > 

GDLCNPQEGKDI ATECCGNQ> 
CODING REGION > 



310 320 330 
TGTTCTGATGACTACATAAGATCTGCTTGTTGTCCATGA 
ACAAGACTACTGATGTATTCTAGACGAACAACAGGTACT 

CSDDYIRSACCP*> 
A DOMAIN > 

CSDDY1RSACCPO 
CODING REGION > 



FIG.8B 



WO 99/54436 
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ZK84.N2 

10 20 30 40 50 60 

ATGCACTCGATCGTCGCCTTGATGCTCATCGGAACAATTCTCCCAATCGCTGCTCTTCAC 
TACG TGAGCTAGCAGCGGAACT ACGAG T AGCCTTG T TAAGAGGG T TAGCGACG AGAAGTG 
MHSIVALMLIGTILPIAA> 

SIGNAL PEPTIDE > 

MHSIVALMLIGTILP1AALH> 

CODING REGION > 

L H> 
> 



70 80 90 100 110 120 

CAGAAGCATCAAGGCITCATCCTGTCGlCATCCGATTCAACCGGAAACCAACCAATGGAT 
GTCTTCGTAGTTCCGAAGTAGGACAGCAGTAGGCTAAGTTGGCCTTTGGTTGGTTACCTA 

OKHQGF I LSSSDSTGNQPMD> 
CODING REGION > 

QKHQGF I LSSSDSTGNQPMD> 
PRO PEPTIDE > 

130 140 150 160 170 180 
GCGATCTCAAGAGCCGACCGTCACACCAACTACCGATCATGCGCATTGCGGCTCATCCCG 
CGCTAGAGTTCTCGGCTGGCAGTGTGGTTGATGGCTAGTACGCGTAACGCCGAGTAGGGC 
AISRADRHTNYRSCALRL I P> 
CODING REGION > 

A I S R> 

> 

ADRHTNYRSCALRLIP> 
B DOMAIN > 



FIG.9A 



WO 99/54436 



PCI7US99/08522 
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190 200 210 220 230 240 
CATGTCTGGTCGGTGTGCGGTGACGCCTGCCAACCACAAAACGGAATCGATGTCGCTCAA 
GTACAGACCAGCCACACGCCACTGCGGACGGTTGGTGTTTTGCCTTAGCTACAGCGAGTT 

N G I D V A Q> 

A DOMAIN > 

HVWSVCGDACQPQNG I DVAQ> 

COOING REGION > 

HVWSVCGDAC0PQ> 
B DOMAIN > 



250 260 270 280 290 300 

AAATGTTGCTCCACTGATTGCAGCTCCGATTACATCAAAGAAATCTGCTGCCCATTTGAC 
TTTACAACGAGGTGACTAACGTCGAGGCTAATGTAGTTTCTTTAGACGACGGGTAAACTG 
KCCSTDCSSDYIKE ICCPFD> 

A DOMAIN > 

KCCSTDCSSDYIKEICCPFD> 
CODING REGION > 



TAA 
ATT 
*> 

_> 
*> 

_> 



FIG.9B 



WO 99/54436 



PCI7US99/08522 



17/51 



ZK1251.2 

10 20 30 40 50 60 

ATGCCACCAATAATTTTGGTTTTCTTTTTGGTTTTAATCCCTGCTTCTCAACAATATCCT 
TACGGTGGTTATTAAAACCAAAAGAAAAACCAAAATTAGGGACGAAGAGTTGTTATAGGA 

Y P> 
> 

MPP I ILVFFLVL I PASQQ> 

SIGNAL PEPTIDE > 

MPPIILVFFLVLIPASQQYP> 

CODING REGION > 

70 80 90 100 HO 120 

TTTTCACTGGAGTCCTTAAATGATCAAATAATCAATGAAGAAGTAATCGAATATATGCTT 
AAAAGTGACCTCAGGAATTTACTAGTTTATTAGTTACTTCTTCATTAGCTTATATACGAA 

FSLESLNDQI INEEVIEYML> 
PRO PEPTIDE > 

FSLESLNDQI INEEV1EYML> 
CODING REGION > 

130 140 150 160 170 180 
GAAAAT TCAAT T AGG TCCAGC AG AACCAGAAG AG TCCC TG ACGAG AAAAAAAT T T ATCG T 
CTTTTAAGTTAATCCAGGTCGTCTTGGTCTTCTCAGGGACTGCTCTTTTTTTAAATAGCA 

VPDEKK I YR> 

B DOMAIN > 

E N S I R S S R T R R> 
PRO PEPTIDE > 

ENSIRSSRTRRVPDEKK I YR> 
CODING REGION > 



FIG. 1 0A 



WO 99/54436 



PCI7US99/08522 



18/51 



190 200 210 220 230 240 
TGTGGAAGAAGAATACATTCGTATGTGTTTGCGGTTTGTGGAAAAGCATGCGAATCGAAT 
ACACCTTCTTCTTATGTAAGCATACACAAACGCCAAACACCTTTTCGTACGCTTAGCTTA 

CGRR I HSYVFAVCGKACESN> 
B DOMAIN > 

CGRRI HSYVFAVCGKACESN> 
CODING REGION > 



250 260 270 280 290 300 
ACTGAAGTTAATATTGCATCAAAAIG I I'GCCG TGAAGAA I GCACCG ACG AC T TCAT TCG A 
TGACTTCAATTATAACGTAGTTTTACAACGGCACTTCTTACGTGGCTGCTGAAGTAAGCT 

TEVNIASKCCREECTDDF I R> 
A DOMAIN > 

TEVNIASKCCREECTDDF I R> 
CODING REGION > 



310 

AAACAGTGCTGTCCTTAA 
TTTGTCACGACAGGAATT 

K Q C C P *> 
A DOMAIN > 

K Q C C P *> 
CODING REG I > 



FIG. 1 0B 
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ZK1251 .N 

10 20 30 40 50 60 

A TG TCGCC AATC AT T T TG AT TTTCTTTTTGGTTTTCAT TCCG T T T TCTCAACAACACACA 
TACAGCGGTTAGTAAAACTAAAAGAAAAACCAAAAGTAAGGCAAAAGAGTTGTTGTGTGT 

H T> 
> 

MSPIILIFFLVFIPFSQO 
SIGNAL PEPTIDE > 

MSPI IL IFFLVF IPFSQQHT> 
CODING REGION > 

70 80 90 100 110 120 

TCTTTAGAGGAGTCCTTAAATGATCGAATAATCAGTGAAGAAGTAGTCGAAATGCTATCA 
AGAAATCTCCTCAGGAATTTACTAGCTTATTAGTCACTTCTTCATCAGCTTTACGATAGT 

SLEESLNDRI ISEEVVEMLS> 
PRO PEPTIDE > 

SLEESLNDRI ISEEVVEMLS> 
CODING REGION > 

130 140 150 160 170 180 

GAGAAAGAAATTAGACCCAGCAGAGT'AAGAAGAGTCCCTGAACAAAAAAATAAATTGTGC 
CTCTTTCTTTAATCTGGGTCGTCTCATTCTTCTCAGGGACTTGTTTTTTTATTTAACACG 

V P E 0 K N K L C> 

B DOMAIN > 

EKE I R P S R V R R> 

PRO PRPTIDE > 

EKE IRPSRVRRVPEQKNKLO 
CODING REGION > 



FIG. 1 1 A 



WO 99/54436 



PCT/US99/08522 
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190 200 210 220 230 240 

GGAAAGCAAGTCTTATCCTACGTTATGGCACTTTGTGAMAAGCATGCGATTCAAATACA 
CCTTTCGTTCAGAATAGGATGCAATACCGTGAAACACTTTTTCGTACGCTAAGTTTATGT 

T> 
> 

GKQVLSYVMALCEKACDSN> 

B DOMAIN 

GKQVLSYVMALCEKACDSNT> 
CODING REGION > 

250 260 270 280 290 300 

AAAGTCGATATTGCGACAAAATGTTGCCGCGATGCATGCTCAGACGAATTCATTCGACAT 
TTTCAGCTATAACGCTGTTTTACAACGGCGCTACGTACGAGTCTGCTTAAGTAAGCTGTA 
KVDIATKCCRDACSDEF I R H> 

A DOMAIN > 

KVDIATKCCRDACSDEF IRH> 
CODING REGION > 



310 

CAATGTTGTCCTTAA 
GTTACAACAGGAATT 

Q C C P *> 
A DOMAIN > 

Q C C P *> 
—CODING R > 



FIG. 1 1 B 



WO 99/54436 
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C06E2.N 

10 20 30 40 50 60 

ATGATCGTCACTTTGATTGTCTTTCTTGTCATTGGACTTCAAATGGCACACCTTTCTCAA 
TACTAGCAGTGAAACTAACAGAAAGAACAGTAACCTGAAGTTTACCGTGTGGAAAGAGTT 

S Q> 



MIVTLIVFLVIGLOMAHL> 
SIGNAL PEPTIDE > 

MIVTLIVFLV1GLQMAHLSQ> 
COOING REGION > 

70 80 90 100 110 120 

GTATCTGGAAACAACGAAAATGGATTCTTAAATCCATTTGATTTGTCTCAATGGAGCGAA 
CATAGACCTTTGTTGCTTTTACCTAAGAATTTAGGTAAACTAAACAGAGTTACCTCGCTT 

VSGNNENGFLNPFDLSQWSE> 
PRO PEPTIDE > 

VSGNNENGFLNPFDLSQWSE> 
CODING REGION > 

130 140 150 160 170 180 
GAAATCCTCCACCGTCAGTATCATCATCACCACCACCATCACCATGGAAATCGGGCGAGA 
CTTTAGGAGGTGGCAGTCATAGTAGTAGTGGTGGTGGTAGTGGTACCTTTAGCCCGCTCT 

E I LHRQYHHHHHHHHGNRAR> 
PRO PEPTIDE > 

E I LHRQYHHHHHHHHGNRAR> 
CODING REGION > 



FIG.12A 



WO 99/54436 



PCT/US99/08522 



22/51 



190 200 210 220 230 240 
AGAACCTTGGAAACCGAAAAAATCTACCGCTGTGGAAGAAAACTCTACACTGATGTGCTA 
TC T TGGAACC TT TGGC T T T T T T AGATGGCG ACACC T TCT T T TG AG ATG TGAC T ACACG AT 
R> 

_> 

TLETEKIYRCGRKLYTDVL> 

B DOMAIN > 

RTLETEKIYRCGRKLYTDVL> 
CODING REGION > 



250 260 270 280 290 300 

t^i^w a a nnnr>nr urnrnk nrrnnrT a rrr a Krkrr a TP TP TPT A APPTPTPPTPT 

AGTCGCACGTTGCCCGGTACACTTGGCCCATGCCTTGTCCTAGAGAGATTCGACACGACA 

T E Q D L S K L C C> 

A DOMAIN > 

SACNGPCEPO 
B DOM I AN > 



SACNGPCEPGTEQDLSKLCO 
CODING REGION : > 



310 320 330 340 350 
GGAAACCAATG T AC T T TCG T TG AAA TCAGG AAAGCATGCTG TGCCG ACAAAT TG T AA 
CCTTTGGTTACATGAAAGCAACTTTAGTCCTTTCGTACGACACGGCTGTTTAACATT 

GNQCTFVE IRKACCADKL*> 
A DOMAIN > 

GNQCTFVE 1RKACCADKL*> 
CODING REGION > 



FIG.12B 



WO 99/54436 



PCT/US99/08522 
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C17C3.4 

10 20 30 40 50 60 

ATGTCTAGTTACCGTCAAACATTGTTCATTCTTATTATTCTTATTGTAATTATTCTCTTC 
TACAGATCAATGGCAGTTTGTAACAAGTAAGAATAATAAGAATAACATTAATAAGAGAAG 

MSSYRQTLFILIILIVIILF> 
SINGAL PEPTIDE > 

M S S Y R Q T L F I L I I L I V 1 I L F> 
CODING REGION > 



70 80 90 100 110 120 
G TCAATGAGGG TCAAGGAGCGCCICACCATGACAAACGGCACAC TGCATGCG TCCTAAAG 
CAGTTACTCCCAGTTCCTCGCGGAGTGGTACTGTTTGCCGTGTGACGTACGCAGGATTTC 
APHHDKRHTACVLK> 
B DOMAIN > 

V N E G Q G> 
SINGAL PEPT > 

VNEGQGAPHHDKRHTACVLK> 
CODING REGION . > 



130 140 150 160 170 180 
ATTTTCAAGGCGCTAAACGTIATGTGTAATCATGAAGGTGATGCAGATGTTCTGAGGAGA 
TAAAAGTTCCGCGATTTGCAATACACATTAGTACTTCCACTACGTCTACAAGACTCCTCT 

V L R R> 

> 

IFKALNVMCNHEGDAD> 
B DOMAIN > 



I FKALNVWCNHEGDADVLRR> 
COOING REGION > 



190 200 210 220 230 240 
ACAGCATCCGACTGCTGTCGGGAGAGCTGCTCGCTAACAGAAATGTTAGCGAGCTGCACC 
TGTCGTAGGCTGACGACAGCCCTCTCGACGAGCGATTGTCTTTACAATCGCTCGACGTGG 

TASDCCRESCSLTEMLASCT> 
A DOMAIN > 

TASDCCRESCSLTEMLASCT> 
COOING REGION > 



250 260 270 
CTCACCAGCTCAGAAGAGTCAACTCGGGACATTTAA 
GAGTGGTCGAGTCTTCTCAGTTGAGCCCTGTAAATT 

LTSSEESTRDI*> 
A DOMAIN > 

LTSSEESTRDlO 
CODING REGION > 



FIG. 13 



WO 99/54436 
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C17C3.N 

10 20 30 40 50 60 

ATGCAATCAAACATCACCGCT TCAT TATTCATAGCGT TGCT TATAT T TGGAG T AATCAGT 
TACGTTAGTTTGTAGTGGCGAAGTAATAAGTATCGCAACGAATATAAACCTCATTAGTCA 

MQSNITASLFIALLIFGV!S> 
SIGNAL PEPTIDE > 

M Q S N I T A S L F I A L L I F G V I S> 
CODING REGION > 



70 80 90 100 110 120 

GCAGCTCCATCTCATGAAAAAACACACAAAAAATGCTCTGATAAATTATATTTGGCGATG 
CGTCGAGGTAGAGTACTTTTTTGTGTGTTTTTTACGAGACTATTTAATATAAACCGCTAC 
APSHEKTHKKCSDKLYLAM> 

B DOMAIN > 

A> 

> 

AAPSHEKTHKKCSDKLYLAM> 
CODING REGION > 



130 140 150 160 170 180 
AAGTCGTTGTGTAGTTATCGAGGTTATAGTGAATTCTTAAGAAATTCTGCAACTAAGTGT 
iiCAGCAACACATCAATAGCTCCAATATCACTTAAGAATTCTTTAAGACGTTGATTCACA 

FLRNSATKO 
A DOMAIN > 



KSLCSYRGYSE> 
B DOMAIN > 



KSLCSYRGYSEFLRNSATKO 
CODING REGION > 



190 200 210 220 230 240 
TGCCAAGACAATTG TGAGATT TCGG AAATGATGGCGT TG TG TG T TGT TGCTCCCAAT TT T 
ACGG T TC TG T T AACAC TC T AAAG CC T T T AC T ACCGC AAC AC AC AAC AACG AGGG T T AAAA 

CQDNCE ISEMMALCVVAPNF> 
A DOMAIN > 

CQDNCE ISEMMALCVVAPNF> 
CODING REGION > 



250 260 
GACGACGATCTCCTTCATTAA 
CTGCTGCTAGAGGAAGTAATT 

D D D L L H *> 
A DOMAIN > 

D D D L L H *> 
-CODING REGION > 

FIG. 14 



WO 99/54436 
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M04D8.1 

10 20 30 40 50 60 

ATGAAMCCTACTCATTTTTCGTGCTTTTTATTGTATTCATCTTTTTTATTTCTTCATCA 
TACTTTTGGATGAGTAAAAAGCACGAAAAATAACATAAGTAGAAAAAATAAAGAAGTAGT 

S> 
> 

MKTYSFFVLFIVFIFFISS> 
SIGNAL PEPTIDE > 

MKTYSFFVLF IVF IFF ISSS> 
CODING REFION > 

70 80 90 100 110 120 

AAATCTCATTCAAAGAAACATGTTCGTTTCCTTTGTGCAACAAAAGCGGTCAAACACATT 
TTTAGAGTAAGTTTCTTTGTACAAGCAAAGGAAACACGTTGTTTTCGCCAGTTTGTGTAA 

KSHSKKHVRF LCATKAVKH I> 
B DOMAIN > 

KSHSKKHVRFLCATKAVKH I> 
CODING REGION > 

130 140 150 160 170 180 
CGGAAAGTATGCCCTGATATGTGTCTCACTGGAGAAGAAGTCGAAGTCAATGAGTTTTGC 
GCCTTTCATACGGGACTATACACAGAGTGACCTCTTCTTCAGCTTCAGTTACTCAAAACG 

EVEVNEFO 
A DOMAIN > 

RKVCPDMCL TGE> 
B DOMAIN > 



RKVCPDMCLTGEEVEVNEFO 
CODING REGION > 

190 200 210 220 230 
AAGATGGGGTACTCGGATTCTCAAATCAAGTACATTTGCTGTCCCGAATAA 
T TCTACCCCATG AGCCT AAGAG TT TAG T TCATG TAAACG ACAGGGCTTAT T 
KMGYSDSQIKYICCPE*> 

A DOMAIN > 

KMGYSDSQIKYICCPEO 
CODING REGION > 



FIG. 15 



WO 99/54436 



PCT/US99/08522 
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M04D8.2 

10 20 30 40 50 60 

ATGCACACT ACAACT AT TCTCATATGCT T TT TCATCT T TCT TG T TCAAG TCTCCACAATG 
TACGTGTGATGTTGATAAGAGTATACGAAAAAGTAGAAAGAACAAGTTCAGAGGTGTTAC 

M> 
> 

M H T T T [ L I C F F I F L V Q V S T> 

SIGNAL PEPTIDE > 

M H T T T I L I C F F I F L V Q V S T M> 

COOING REGION > 



70 80 90 100 110 120 

GATGCTCACACTGACAAATACGTCAGAACTCTGTGTGGAAAAACTGCAATCAGAAATATT 
CTACGAGTGTGACTGTTTATGCAGTCTTGAGACACACCTTTTTGACGTTAGTCTTTATAA 

DAHTDKYVRTLCGKTAI RN I> 
B DOMAIN > 

DAHTDKYVRTLCGKTAI RN I> 
CODING REGION > 



130 140 150 160 170 180 
GCCAACCTTTGCCCGCCAAAGCCAGAAATGAAGGGTATCTGTTCTACCGGAGAGTATCCA 
CGGTTGGAAACGGGCGGTTTCGGTCTTTACTTCCCATAGACAAGATGGCCTCTCATAGGT 

Y P> 
> 

ANLCPPKPEMKG I CSTGE> 
B DOMAIN > 



ANLCPPKPEMKG I CSTGEYP> 
CODING REGION > 



190 200 210 220 230 240 

AGCATCACCGAATACTGTTCCATGGGATTTTCAGACTCTCAGATCAAGTTTATGTGCTGT 
TCGTAGTGGCTTATGACAAGGTACCCTAAAAGTCTGAGAGTCTAGTTCAAATACACGACA 

SITEYCSMGFSDSQIKFMCO 
A DOMAIN > 

SITEYCSMGFSDSQIKFMCO 
CODING REGION > 



250 

GATAACCAATGA 
CTATTGGTTACT 
D N Q *> 



D N Q *> 



FIG. 16 
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M04D8.3 

10 20 30 40 50 60 

ATGTTCGTTCTTCTTATTATTCTCTCTATCATTCTGGCTCAAGTCACTGATGCTCATTCA 
TACAAGCAAGAAGAATAATAAGAGAGATAGTAAGACCGAGTTCAGTGACTACGAGTAAGT 

Q V T D A H S> 

B DOMAIN > 

MFVLLI ILSI ILA> 

SIGNAL PEPTIDE > 

MFVLLI ILSI ILAQVTDAHS> 
CODING REGION > 

70 80 90 100 110 120 

GAGCTTCACGTTCGTAGGGTGTGCGGAACTGCTATCATAAAGAACATAATGCGATTGTGC 
C TCGAAGTGCAAGCATCCCACACGCCTTGACGATAG T AT T TCTTGT AT TACGCTAACACG 

ELHVRRVCGTAI IKNIMRLO 
B DOMAIN > 

ELHVRRVCGTAI IKNIMRLO 



130 140 150 160 170 180 
CCAGGGGTACCGGCTTGCGAAAATGGAGAAGTTCCAAGTCCAACCGAGTACTGTTCAATG 
GGTCCCCATGGCCGAACGCTTTTACCTCTTCAAGGTTCAGGTTGGCTCATGACAAGTTAC 

V P S P T E Y C S M> 

A DOMAIN > 

PGVPACENGE> 
B DOMAIN > 



PGVPACENGEVPSPTEYCSM> 
CODING REGION > 

190 200 210 220 230 
GGGTACTCAGACAGCCAGGTAAAATACCTATGCTGTCCAACTTCTCAGTGA 
CCCATGAGTCTGTCGGTCCATTTTATGGATACGACAGGTTGAAGAGTCACT 
GYSDS0VKYLCCPTSQ*> 

A DOMAIN > 

GYSDSQVKYLCCPTSQO 
CODING REGION > 



FIG. 17 
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ZK84.N 

10 20 30 40 50 60 

ATGGACAAACCATCCTACCTGTCATCCAAAGAAGCATGGAAAATGCTAAATGAGCTGCTG 
TACCTGTTTGGTAGGATGGACAGTAGGTTTCTTCGTACCTTTTACGATTTACTCGACGAC 

M D K P S Y L S S K E A W K M L N E L L> 
COOING REGION : > 

MDKPSYLSSKEAWKMLNEIL> 
SIGNAL PEPTIDE > 



70 80 90 100 110 120 

AAAGAGCCGAAACATCATCATCATCATCACAGGCACAAAGGATATTGTGGAGTTAAAGCT 
TTTCTCGGCTTTGTAGTAGTAGTAGTAGTGTCCGTGTTTCCTATAACACCTCAATTTCGA 

KEPKHHHHHHRHKGYCGVKA> 
B DOMAIN > 

KEPKHHHHHHRHKGYCGVKA> 
CODING REGION > 



130 140 150 160 170 180 
GTAMGAwAATTAAAACAAATCTGTCCAGATCTT™^ 

catttctVtmttTtgt t tagacaggtctagaaacgagct tacaactactattgg aagag 

N L L> 

> 

VKKLKQICPDLCSNVDD> 
B DOMAIN > 



VKKLKQ ICPDLCSNVDDNLL> 
CODING REGION > 



190 200 210 220 230 240 
ATGGAAATGTGCTCAAAAAACCTGACGGATGATGATATTTIGCAACGGTGCTGTCCAGAA 
TACCTTTACACGAGTTTTTTGGACTGCCTACTACTATAAAACGTTGCCACGACAGGTCTT 
MEMCSKNLTDDD1 LQRCCPE> 

A DOMAIN > 

MEMCSKNLTDDDI LQRCCPE> 
CODING REGION > 



TGA 
ACT 
*> 

__> 
*> 

_> 
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FS6F3.6 

10 20 30 40 50 60 

ATGTTCTCGACCAGAGGGGTACTCCTTTTACTGTCTTTGATGGCTGCTGTAGCCGCATTC 
TACAAGAGCTGGTCTCCCCATGAGGAAAATGACAGAAACTACCGACGACATCGGCGTAAG 

F> 
> 

MFSTRGVLLLLSLMAAVAA> 
SIGNAL PEPTIDE > 

MFSTRGVLLLLSLMAAVAAF> 
CODING REGION > 



70 80 90 100 110 120 
GGGCTGTTTTCTAGACCGGCTCCAATCACTCGGGACACTATCCGACCACCACGTGCCAAA 
CCCGACAAAAGATCTGGCCGAGGTTAGTGAGCCCTGTGATAGGCTGGTGGTGCACGGTTT 
GLFSRPAPI TRDTIRPPRAK> 
PRO PEPTIDE > 

GLFSRPAP I TRDT IRPPRAK> 
CODING REGION > 



130 140 150 160 170 180 
CACGGTTCGCTGAAATTATGCCCACCAGGTGGTGCCTCATTCCTTGACGCTTTCAACTTG 
G TG CCAAGCG AC T T T AAT ACGGGTGG TCCACCACGG AG T AAGG AAC TGCGAAAG T TGAAC 
H> 

_> 

HGSLKLCPPGGASFLDAFNL> 

CODING REGION > 

B DOMAIN > 
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190 200 210 220 230 240 
ATT TGCCC AATGCGCCG TCGACGCAGGAG TG T T TC AGAAAAC T ACAACGACGGCGG TGGC 
TAAACGGGTTACGCGGCAGCTGCGTCCTCACAAAGTCTTTTGATGTTGCTGCCGCCACCG 
I CPMRRRRRSVSENYNOGGO 

CODING REGION > 

B DOMAIN > 

> 



250 260 270 280 290 300 
AGCC T T T TGGG ACGG ACAA TG AAT ATG TGCTG TGAGACGGGA TG TG AAT TCACTGACATT 
TCGGAAAACCCTGCCTGTTACTTATACACGACACTCTGCCCTACACTTAAGTGACTGTAA 
SLLGRTMNMCCETGCEFTOI> 

CODING REGION > 

A DOMAIN > 

310 320 
TTCGCAATCTGCAATCCTTTTGGATAA 
AAGCGTTAGACGTTAGGAAAACCTATT 
F A I C N P F G *> 

CODING REGION > 

A DOMAIN > 



FIG.19B 
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T28B8.N 

10 20 30 40 50 60 

ATGGTCCACCGACTTTTCATCGTCCTTATTGCAATTATTCTTGTCGCAAAATCAACTGCA 
T ACCAGG TGGCTG AAMG TAGC AGG AA TAACG T T AAT AAG AACAGCG T T T TAG T TG ACG T 
MVHRLF1VLIAI ILVAKSTA> 

SIGNAL PEPTIDE > 

MVHRLFIVLIAI I L V A K S T A> 
COOING REGION > 

70 80 90 100 110 120 

ATCTCACT TCAACAAGCTGACGGACGCATGAAAATG TGCCCACCAGGTGG T TCAACAT TC 
TAGAGTGAAGTTGTTCGACTGCCTGCGTACTTTTACACGGGTGGTCCACCAAGTTGTAAG 

I SLQQADGRMKMCPPGGSTF> 
B DOMAIN > 

I SLQQADGRMKMCPPGGSTF> 
CODING REGION > 

130 140 150 160 170 180 
ACAATGGCATGG TCAATGTCG TG T TCGATGCGCAGG AGAAAACGAG ATG T TGGACGATAT 
TGTTACCGTACCAGTTACAGCACAAGCTACGCGTCCTCTTTTGCTCTACAACCTGCTATA 

TMAWSMSCSMRRRKRDV GRY> 
B DOMAIN > 

TMAWSMSCSMRRRKRDVGRY> 
CODING REGION > 



190 200 210 220 230 240 
TTCGAAAAACGTGCTCTGATCGCCCCATCAATCCGTCAACTTCAAACAATTTGCTGTCAA 
AAGCTTTTTGCACGAGACTAGCGGGGTAGTTAGGCAGTTGAAGTTTGTTAAACGACAGTT 
F E> 
> 

FEKRAL IAPSIRQLQT ICCQ> 

CODING REGION > 

KRAL IAPSIRQLQT ICCQ> 
A DOMAIN > 



250 260 270 280 
GTTGGTTGCAACGTGGAAGATCTTCTTGCCTACTGTGCCCCAATTTAA 
CAACCAACG TTGCACCT TC TAGAAGAACGG ATGACACGGGGT T AAAT T 
VGCNVEDLLAYCAPI*> 

CODING REGION > 

VGCNVEDLLAYCAPI *> 
A DOMAIN > 



FIG. 20 



WO 99/54436 



PCT/US99/08522 



32/51 

ZC334.N 

10 20 30 40 50 60 

ATGAAATTCTTCCGCTTAATGTTGCTCTGCGCCCTTGTCCTGACCACCATGGCTTTTTTG 
TACTTTAAGAAGGCGAATTAGAACGAGACGCGGGAACAGGACTGGTGGTACCGAAAAAAC 
MKFFRL I LLCALVL TTMA> 

SIGNAL PEPTIDE > 

MKFFRL 1 LLCALVLTTMAFL> 

CODING REGION > 

F L> 
> 

70 80 90 100 110 120 

GCTCCAAGTACGGCAGCCAAGAGGCGTTGTGGCCGCCGCTTAATTCCCTATGTCTATTCA 
CGAGGTTCATGCCGTCGGTTCTCCGCAACACCGGCGGCGAATTAAGGGATACAGATAAGT 

APSTAAKRRCGRRL IPYVYS> 
CODING REGION > 

APSTAAKRRCGRRL IPYVYS> 
B DOMAIN > 



130 140 150 160 170 180 

ATATGCGGCGGCCCGTGCGAGAATGGAGATATTATCATCGAGCACTGCTTCTCCGGAACA 
TATACGCCGCCGGGCACGCTCTTACCTCTATAATAGTAGCTCGTGACGAAGAGGCCTTGT 

ICGGPCENGDI I IEHCFSGT> 
CODING REGION > 

ICGGPCENGD> 
B DOMAIN > 

I I I E H C F S G T> 
A DOMAIN > 

190 200 210 220 230 240 

ACTCCCACCATTGCCGAAGTCCAAAAGGCTTGCTGTCCTGAACTATCTGAAGACCCAACT 
TGAGGGTGGTAACGGCTTCAGGTTTTCCGAACGACAGGACTTGATAGACTTCTGGGTTGA 

TPT IAEVQKACCPELSEDPT> 
CODING REGION > 

TPT IAEVQKACCPELSEDPT> 
A DOMAIN > 

250 

TTCTCATCTTAA 
AAGAGTAGAATT 

F S S *> 
> 

F S S *> 
> 



FIG.21 
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T08G5.N 

10 20 30 40 50 60 

ATGTCACTGCATTTCTCCACTATTCAAAAAACAATTCTTCTAATCTCATTCTTGCTCCTC 
TACAGTGACGTAAAGAGGTGATAAGTTTTTTGTTAAGAAGATTAGAGTAAGAACGAGGAC 
M S L H F S T 1 Q K T I L L I S F L L L> 

SIGNAL PEPTIDE > 

M S L H F 5 T I Q K T I L L I S F L L L> 
CODING REGION > 

70 80 90 100 110 120 
G TAACATTGGC TCCCAGAACAAG TGCAGCTT TTCCAT TCCAAA TT TG TG TCAAAAAAATG 
CATTGTAACCGAGGGTCTTGTTCACGTCGAAAAGGTAAGGTTTAAACACAGTTTTTTTAC 
V T L A P R T S A> 
SIGNAL PEPTIDE > 

VTLAPRTSAAFPFQICVKKM> 

CODING REGION > 

AFPFQICVKKM> 
B DOMAIN > 

130 140 150 160 170 180 

GAAAAAATGTGCAGAATCATCAATCCAGAGCAGTGTGCACAAGTAAATAAAATCACTGAG 
PTTTTTTAOA^TrTTArTArTTArrTrTrrTPArArrTrTTrATTTATTTTAnTrArTr 

\j I I I I I I nV//"\VAJ I \j I I rvj I nu I I nuu I \j I l/U l ununw iui i \jr\ i i mi i i kui unv i u 

EKMCRI INPEQCAQVNKITE> 

CODING REGION > 

EKMCRI INPEQCAQVNKITE> 
B DOMAIN > 



190 200 210 220 230 240 
ATTGGAGCATTGACAGACTGTTGCACCGGACTGTGCTCCTGGGAAGAAATCCGGATCTCC 
TAACCTCGTAACTGTCTGACAACGTGGCCTGACACGAGGACCCTTCTTTAGGCCTAGAGG 

IGALTDCCTGLCSWEEIRIS> 
CODING REGION > 

I G> 
> 

A L T D C C T G L C S W E E I R I S> 
A DOMAIN > 



250 

TGCTGCTCCGTTTTATAA 
ACGACGAGGCAAAATATT 
C C S V L *> 

CODING REG I > 

C C S V L> 
_A DOMAIN > 
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F41G3.N 

10 20 30 40 50 60 

ATGCTCACACATCTGAAATTCTTGCTTCTAGTGAGCCTTTTTATCAACTTCGCCGTAAGC 
TACGAGTGTGTAGACTTTAAGAACGAAGATCACTCGGAAAAATAGTTGAAGCGGCATTCG 

MLTHLKFLLLVSLF I N F A V S> 
SIGNAL PEPTIDE > 

MLTHLKFLLLVSLF I N F A V S> 
CODING REGION > 



70 80 90 100 110 120 

TCTGAAGACATCAAATGCGATGCAAAGTTCATTTCGAGAATCACGAAACTCTGTATTCAC 
AGACTTCTGTAGTTTACGCTACGTTTGAAGTAAAGCTCTTAGTGCTTTGAGACATAAGTG 

SED I KCDAKF I SR I TKLC I H> 
B DOMAIN > 

SED I KCDAKF ISRI TKLC I H> 
CODING REGION > 



130 140 150 160 170 180 
GGAATTACTGAAGATAAACTTGTTCGTCTTCTCACAAGATGCTGCACATCTCACTGCTCC 
CCTTAATGACTTCTATTTGAACAAGCAGAAGAGTGTTCTACGACGTGTAGAGTGACGAGG 
G I T E D K> 
B DOMAIN > 

LVRLLTRCCTSHCS> 

A DOMAIN > 

G I TEDKLVRLLTRCCTSHCS> 
CODING REGION > 



190 200 210 220 230 240 

AAAGCTCATCTGAAAATGTTCTGCACCCTGAAACCTCACGAAGAAGAACCACATCACGAA 
TTTCGAGTAGACTTTTACAAGACGTGGGACTTTGGAGTGCTTCTTCTTGGTGTAGTGCTT 

KAHLKMFCTLKPHEEEPHHE> 
A DOMAIN > 

KAHLKMFCTLKPHEEEPHHE> 
CODING REGION > 



ATCTAA 
TAGATT 
I> 

_> 
I *> 
> 
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F41G3.N2 

10 20 30 40 50 60 

ATGAAGCTTCTTCCTCTCATTGTGGTTTTTGCTCTTTTGGCAGTCATATCAGAATCATAT 
TACTTCGAAGAAGGAGAGTAACACCAAAAACGAGAAAACCGTCAGTATAGTCTTAGTATA 

MKLLPL IVVFALLAVI SESY> 
SIGNAL PEPTIDE > 

MKLLPL IVVFALLAVI SESY> 
CODING REGION > 



70 80 90 100 110 120 

TCTGGAAATGACTTCCAACCTCGTGACAATAAACATCATTCCTATCGTTCATGTGGGGAA 
AGACCTTTACTGAAGGTTGGAGCACTGTTATTTGTAGTAAGGATAGCAAGTACACCCCTT 
GNDFQPRDNKHHSYRSCGE> 

B DOMAIN > 

S> 
_> 

SGNDFQPRDNKHHSYRSCGE> 
CODING REGION > 

130 140 150 160 170 180 
TCGTTGAGCCGACGAGTTGCATTTCTGTGTAATGGTGGAGCTATTCAAACAGAAATACTA 
AGCAACTCGGCTGCTCAACGTAAAGACACATTACCACCTCGATAAGTTTGTCTTTATGAT 
SLSRRVAFLCNGGAIQT> 
B DOMAIN > 

E I L> 
> 

SLSRRVAFLCNGGA IQTE I L> 
CODING REGION > 

190 200 210 220 230 240 

AGAGCTCTGGATTGTTGTTCCACTGGTTGTACGGACAAACAGATCTTTTCTTGGTGTGAT 
TCTCG AG ACC T AACAAC AAGG TG ACC AACATGCCTG T T TG TC T AG AAAAG AACCACAC TA 

RALDCCSTGCTDKQ I FSWCD> 
A DOMAIN > 

RALDCCSTGCTDKQ I FSWCD> 
CODING REGION > 



250 

TTTCAAATTTGA 
AAAGTTTAAACT 

F Q I> 
> 

F 0 I *> 
> 
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C17C3.N2 

10 20 30 40 50 60 

ATGAAGCTTTTACATATTTTTATTATTTTTCTGTTATTCCAATCGTGCTCTAATAAAATG 
TACTTCGAAAATGTATAAAAATAATAAAAAGACAATAAGGTTAGCACGAGATTATTTTAC 

N K M> 
> 

MKLLHIF I IFLLFQSCS> 

SIGNAL PEPTIDE > 

MKLLHIF I IFLLFQSCSNKM> 
CODING REGION > 

70 80 90 100 110 120 

TGTCAATATTCAAAGAAAAAGTACAAGATTTGTGGAGTTAGAGCTCTTAAGCATATGAAA 
ACAGTTATAAGTTTCTTTTTCATGTTCTAAACACCTCAATCTCGAGAATTCGTATACTTT 
CQYSKKKYK ICGVRALKHMK> 

B DOMAIN > 

CQYSKKKYKICGVRALKHMK> 



130 140 150 160 170 180 
GTCTATTGTACACGTGGAATGACAAGAGATTATGGAAAATTACTCGTGACTTGTTGTTCG 
CAGATAACATGTGCACCTTACTGTTCTCTAATACCTTTTAATGAGCACTGAACAACAAGC 
VYCTRGMTRD> 
B DOMAIN > 

YGKLLVTCCS> 

A DOMAIN > 

VYCTRGMTRDYGKLLVTCCS> 
CODING REGION > 

190 200 210 220 

AAAGGATGTAATGCAATAGATATCCAACGTATTTGTTTATGA 
TTTCCTACATTACGTTATCTATAGGTTGCATAAACAAATACT 
KGCNAIDIQRICL> 

A DOMAIN > 

KGCNA I D IQR ICL*> 
CODING REGION > 
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ZC334.N2 

10 20 30 40 50 60 

ATGAGATCTCCCACCTTGTTTCTTCTTCTGCTCCTAGTGCCCCTGGCACTATGCCATGTC 
TACTCTAGAGGGTGGAACAAAGAAGAAGACGAGGATCACGGGGACCGTGATACGGTACAG 

MRSPTLFLLLLLVPLALCHV> 
CODING REGION > 

MRSPTLFLLLLLVPLALO 
SIGNAL PEPTIDE > 

H V> 
> 

70 80 90 100 110 120 

TTCTCGGAGCCCGCGGATTTGGAGCTCAAAAGCTACCAAGCGCTTGAAAAAAGCCTCAAG 
AAGAGCCTCGGGCGCCTAAACCTCGAGTTTTCGATGGTTCGCGAACTTTTTTCGGAGTTC 

FSEPADLELKSYQALEKSLK> 
CODING REGION > 

FSEPADLELKSYQALEKSLK> 
B DOMAIN > 

130 140 150 160 170 180 

GAGATGGGACTCATTCGAGCCAACCAGGGACCTCAAAAAGCGTGCGGACGATCAATGATG 
CTCTACCCTGAGTAAGCTCGGTTGGTCCCTGGAGTTTTTCGCACGCCTGCTAGTTACTAC 

EMGL IRANQGPQKACGRSMM> 
CODING REGION > 

EMGL I RANQGPQKACGRSMM> 
B DOMAIN > 



FIG.26A 
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190 200 210 220 230 240 

ATGAAGGTGCAGAAGCTTTGCGCGGGCGGATGCACAATTCAGAACGACGATCTTACCATC 
TACTTCCACGTCTTCGAAACGCGCCCGCCTACGTGTTAAGTCTTGCTGCTAGAATGGTAG 

MKVQKLCAGGCT IQNDDLT I> 
CODING REGION > 

MKVQKLCAGGCT I Q N D D> 
B DOMAIN > 

L T I> 
> 



250 260 270 280 290 300 

AAATCCTGCAGTACTGGGTACACCGATGCCGGCTTCATCTCGGCCTGCTGCCCATCTGGC 
TTTAGGACGTCATGACCCATGTGGCTACGGCCGAAGTAGAGCCGGACGACGGGTAGACCG 

KSCSTGYTDAGFISACCPSO 
CODING REGION > 

KSCSTGYTDAGFISACCPSO 
A DOMAIN > 



310 

TTCGTTTTCTAA 
AAGCAAAAGATT 
F V F *> 



F V F> 
> 
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ZC334.N3 

10 20 30 40 50 60 

ATGTTGTTCAAAATCATCATTTTATTITICCTGCTCCTCCAGCTTTCTGAAGCCAAACCG 
TACAACAAG T TT TAGT AG T AAAATAAAAAGGACGAGGAGG TCGAAAGACT TCGG TT TGGC 

MLFK1 I I LFFLLLQLSEAKP> 
COOING REGION > 

M L F K I I I L F F L L L Q L S E A> 
SIGNAL PEPTIDE > 

K P> 
> 

70 80 90 100 HO 120 

G AAGCCCAG AGGCGC TGCGGCCGG T AT T T AAT TCG T T T T T TGGGGGAACTG TG T AATGG T 
CTTCGGGTCTCCGCGACGCCGGCCATAAATTAAGCAAAAAACCCCCTTGACACATTACCA 

EAQRRCGRYL IRFLGELCNO 
CODING REGION > 

EAQRRCGRYL IRFLGELCNO 
B DOMAIN > 

130 140 150 160 170 180 
CCCTGCTCAGGAG T T TCAAGCGTTGACATTGCCACAAT TGCCTGTGCAACCGCCG TCCCA 
GGGACG AG TCCTCAAAG T TCGCAAGTGT AACGGTG TT AACGGACACG T TGGCGGCAGGG T 

PCSGVSSVDIATIACATAVP> 
CODING REGION > 

P C S G V S S V D> 
B DOMAIN > 

I A T I A C A T A V P> 
A DOMAIN > 

190 200 210 
ATCGAAGATCTGAAGAATATGTGTTGCCCAAATTTGTGA 
TAGCTTCTAGACTTCTTATACACAACGGGTTTAAACACT 

IEDLKNMCCPNLO 
CODING REGION > 

I EDLKNMCCPNL> 
A DOMAIN > 



FIG. 27 



WO 99/54436 



PCT/US99/08522 



40/51 



ZC334.N4 

10 20 30 40 50 60 

ATGAGAGCTCTCGTCGCTATTCTCTGCCTTATGGCACTATGCCATGCAGCAATGCTCGAT 
TACTCTCGAGAGCAGCGATAAGAGACGGAATACCGTGATACGGTACGTCGTTACGAGCTA 

MRALVAI LCLMALCHAAMLD> 
CODING REGION > 

MRALVA I LCLMALCHA> 
SIGNAL PEPTIDE > 

A M L D> 
> 

70 80 90 100 110 120 

GAGCTGGAGATGCAGAAGGAGGTTCAGGAGTTCCATCACATGAACGGCATGCTCCAAGAG 
CTCGACCTCTACGTCTTCCTCCAAGTCCTCAAGGTAGTGTACTTGCCGTACGAGGTTCTC 

ELEMQKEVQEFHHMNGMLQE> 
CODING REGION > 

ELEMQKEVQEFHHMNGMLQE> 
B DOMAIN > 

130 140 150 160 170 180 

TTCATGAATAAGGGGCTCATCGGGAATCATCACCATGGTACCAAGGCCGGCCTCACCTGC 
AAGTACTTATTCCCCGAGTAGCCCTTAGTAGTGGTACCATGGTTCCGGCCGGAGTGGACG 

FMNKGL IGNHHHGTKAGLTO 
CODING REGION > 

FMNKGL IGNHHHGTKAGLTO 
B DOMAIN > 
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190 200 210 220 230 240 

GGGATGAACATCATCGAGAGAGTCGACAAGCTGTGCAATGGGCAGTGCACTCGGAACTAT 
CCCTACTTGTAGTAGCTCTCTCAGCTGTTCGACACGTTACCCGTCACGTGAGCCTTGATA 

GMNI I ERVDKLCNGQCTRNY> 
CODING REGION > 

GMNI I ERVDKLCNGOCTRNY> 
B DOMAIN > 

250 260 270 280 290 300 

GATGCACTCGTCATCAAGTCCTGCCACCGCGGAGTCTCGGACATGGAGTTCATGGTGGCA 
CTACGTGAGCAGTAGTTCAGGACGGTGGCGCCTCAGAGCCTGTACCTCAAGTACCACCGT 
DALV I KSCHRGVSDMEFMVA> 

CODING REGION > 

D A> 
> 

LV I KSCHRGVSDMEFMVA> 
A DOMAIN > 



310 320 330 

TGCTGCCCAACCATGAAGCTATTCATTCACTAA 
ACGACGGGTTGGTACTTCGATAAGTAAGTGATT 

CCPTMKLFIHO 
CODING REGION > 

C C P T M K L F I H> 
A DOMAIN > 
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ZC334.N5 

10 20 30 40 50 60 

ATGATGCGCTCATTCTTTGTGCTCTTGGCTCTGCTCGCAATAGTCACCAGCACCGCTAGT 
T ACTACGCG AG T AAGAAACACG AGAACCGAG ACGAGCG T TATCAG TGGTCGTGGCGATCA 

MMRSFFVLLALLA I VTSTAS> 
COOING REGION > 

MMRSFFVLLALLAIVTST> 
SIGNAL PEPTIDE > 

A S> 
> 

70 80 90 100 110 120 

CCCACTTGTGGCAGGGCTCTTCTACACCGGATCCAGTCGGTTTGCGGTCTCTGTACCATC 
GGG TGAACACCGTCCCGAGAAG ATG TGGCCT AGG TCAGCCAAACGCCAGAG ACATGGT AG 
PTCGRALLHRIQSVCGLCTI> 

COOING REGION > 

PTCGRALLHR IQSVCGLCT I> 
B DOMAIN > 

130 140 150 160 170 180 
GACGCTCACCACGAACTGATTGCCATTGCCTGCTCAAGGGGACTGGGCGATAAGGAAATC 
CTGCX5AG TGGTGC T TG AC T AACGGT AACGGACGAG T TCCCCTG ACCCGC T ATTCCT T TAG 
DAHHEL IAIACSRGLGDKE I> 

COOING REGION > 

D A H H E> 
_B DOMAIN > 

L IAIACSRGLGDKE 1> 
A DOMAIN > 

190 200 
AT TGAAATG TGCTGTCCAATCTAA 
TAACTTTACACGACAGGTTAGATT 

I E M C C P I »> 
CODING REGION > 

I E M C C P I> 
A DOMAIN > 
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ZC334.N6 

10 20 30 40 50 

ATGTTCTGTAAATTTGTATTCCTGATCTTTCTACTCATCTCTCTGTCAGT 
TACAAGACATTTAAACATAAGGACTAGAAAGATGAGTAGAGAGACAGTCA 

MFCKFVFL IFLL I SLSV> 
CODING REGION > 

MFCKFVFL IFLL ISLSV> 
SIGNAL PEPTIDE > 



60 70 80 90 100 

GGCC ACCGC TG ACT T TGGCGCCCAGCGCCG T TG TGGGCGCCACT TGG TG A 
CCGG TGGCG AC TG AAACCGCGGG TCGCGGCAAC ACCCGCGG TG AACCACT 

ATADFGAQRRCGRHLV> 
CODING REGION > 

A T A> 
> 

DFGAQRRCGRHLV> 
B DOMAIN > 

110 120 130 140 150 

ArTTnrTrfiAGGGACTCTGCGGTGGCCCGTGCTCTGAAGCTCCGACTGTT 
TGAAGGAGCTCCCTGAGACGCCACCGGGCACGAGACTTCGAGGCTGACAA 
NFLEGLCGGPCSEAPTV> 

CODING REGION > 

NFLEGLCGGPCSEAPTV> 
. B DOMAIN . > 

160 170 180 190 200 

GAACTAGCTTCGTGGGCATGTTCATCAGCAGTCTCAATTCAGGATCTCGA 
CTTGATCGAAGCACCCGTACAAGTAGTCGTCAGAGTTAAGTCCTAGAGCT 

ELASWACSSAVSIODLE> 
CODING REGION > 

E> 
> 

LASWACSSAVSIQDLE> 
A DOMAIN > 



210 220 230 

AAAATTGTGCTGTCCTTCAAATCTTGCTTGA 
T T T T AAC ACG AC AGG AAG T T T AG AACG AAC T 

KLCCPSNLAO 
CODING REGION > 

K L C C P S N L A> 
A DOMAIN > 
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ZC334.N7 

10 20 30 40 50 60 

ATGAGTTCTCACGCCCTGGTTCTTTTCCTTCTCCTTTTCCTCCTACCAGTGGCAGTGGGC 
TACTCAAGAGTGCGGGACCAAGAAAAGGAAGAGGAAAAGGAGGATGGTCACCGTGACCCG 

MSSHALVLFLLLFLLPVALO 
mm NIG REGION— > 

MSSHALVLFLLLFLLPVALO 

SIBNAL PEPTIDE— > 

70 80 90 100 110 120 

CACTTCCTCTCCAAGCCTGCACCGGATCCAAGGATCACATTCAACCGTAAGCTTGCGGAG 
GTGAAGGAGAGGTTCGGACGTGGCCTAGGTTCCTAGTGTAAGTTGGCATTCGAACGCCTC 
HFLSKPAPDPRI TFNRKLAE> 

CODING REGION > 

HFLSKPAPDPRI TFNRKLAE> 
. B DOMAIN > 

130 140 150 160 170 180 

ACACTCAAGGAGCTTCAGGACATGGGACTCATCCAGGCCCCCCGTGAGCCGGTAGTGGCG 
TG TGAG T TCCTCG AAG TCC TG T ACCC TG AG T AGGTCCGGGGGGCACTCGGCC ATCACCGC 

TLKELQDMGL IQAPREPVVA> 
CODING REGION > 

TLKELQDMGL 1QAPREPVVA> 

B DOMAIN > 



FIG.31A 
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190 200 210 220 230 240 

GCTCAGGGAGCCAAGAAGACTTGCGGAAGGAGTTTGTTGATAAAGATCCAACAACTCTGC 
CGAGTCCCTCGGTTCTTCTGAACGCCTTCCTCAAACAACTATTTCTAGGTTGTTGAGACG 

AQGAKKTCGRSLL I K I QQL C> 
CODING REGION. > 

AQGAKKTCGRSLLIKIQQLO 
B DOMAIN > 



250 260 270 280 290 300 
CATGGAATCTGCACAGTTCACGCTGATGACCTCCACGAAACGGCATGCATGAAAGGTCTC 
GTACCTTAGACGTGTCAAGTGCGACTACTGGAGGTGCTTTGCCGTACGTACTTTCCAGAG 
H G ICTVHADDLHETACMKGL> 
CODING REGION— > 



n n i r T \/ u a n 



B DOMAIN > 

LHETACMKGL> 
A DOMAIN > 

310 320 330 340 350 360 

ACCGACTCTCAGCTGATCAACTCCTGCTGCCCACCAATCCCCCAGACACCATTCGTCTTC 
TGGCTGAGAGTCGACTAGTTGAGGACGACGGGTGGTTAGGGGGTCTGTGGTAAGCAGAAG 

TDSQL I NSCCPP I PQTPFVF> 
_C0DING REGION > 

T D S Q L I N S C C P P I P Q T P F V F> 

A DOMAIN > 



TGA 
ACT 
*> 
> 



FIG. 31 B 
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T10D4.N 

10 20 30 40 50 
ATGAAGATGCCCTTGATCTTGCTGCTTCTCGTCGCCGCCGCATCGGCGTT 
TACTTCTACGGGAACTAGAACGACGAAGAGCAGCGGCGGCGTAGCCGCAA 
M K M P L I L L L L V A A A S A> 
SIGNAL PEPTIDE > 

F> 
_> 

M K M P L I L L L L V A A A S A F> 
COOING REGION > 



60 70 80 90 100 

CGTCCACCACT TTGACCAT TCAATGT T TGCCAGACCGGAGAAAACGTGTG 
GCAGGTGGTGAAACTGGTAAGTTACAAACGGTCTGGCCTCTTTTGCACAC 

VHHFDHSMFARPEKTO 
,B DOMAIN 1 > 

VHHFDHSMFARPEKTO 

COOING REGION > 



110 120 130 140 150 
GAGGACTACTCATTCGTCGTGTCGATAGAATTTGCCCGAATCTAAATTAT 
CTCCTGATGAGTAAGCAGCACAGCTATCTTAAACGGGCTTAGATTTAATA 
G G L L I R R V D R I C P N L N Y> 

B DOMAIN 1_ > 

G G L L I R R V D R I C P N L N Y> 

CODING REGION . > 

160 170 180 190 200 
ACATATAAAATTGAGTGGGAACTTATGGACAACTGTTGCGAAGTGGTTTG 
TGTATATTTTAACTCACCCTTGAATACCTGTTGACAACGCTTCACCAAAC 

TYKIEWELMDNCCEVVO 
A DOMAIN 1_ > 

TYKIEWELMDNCCEVVO 

CODING REGION > 



210 220 230 240 250 
CGAGGACCAGTGGATTAAGGAAACCTTTTGCAGAGCGCCCAGGTTCAACT 
GCTCCTGGTCACCTAATTCCTTTGGAAAACGTCTCGCGGGTCCAAGTTGA 

E D Q W I K E T F C R A P R F N> 
A DOMAIN 1 > 

EDQWIKETFCRAPRFN> 
CODING REGION > 



FIG.32A 
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260 270 280 290 300 
TTTTCGGACCTTCATTCAAAGCCCTTGAAAGATCGTGTGGACCAAAACTG 
AAAAGCCTGGAAGTAAGTTTCGGGAACTTTCTAGCACACCTGGTTTTGAC 

F F G P S F> 
A DOMAIN 1 > 

K A L E R S C G P K L> 

B DOMAIN 2 > 

F F G P S F K A L E R S C G P K L> 

CODING REGION — > 

310 320 330 340 350 
TTCACAAGGGTTAAAACTGTGTGCGGTGAAGACATCAATGTTGATAATAA 
AAGTGTTCCCAATTTTGACACACGCCACTTCTGTAGTTACAACTATTATI 
F T R V K T V C G E> 
B DOMAIN 2 > 

D I N V D N K> 

A DOMAIN 2 > 

FTRVKTVCGED1NVDNK> 

CODING REGION > 

360 370 380 390 400 
AGTCAAGATTTCGGATCACTGCTGCACACCAGAGGGAGGATGCACAGACG 
TCAGTTCTAAAGCCTAGTGACGACGTGTGGTCTCCCTCCTACGTGTCTGC 

V K I S D H C C T P E G G C T D> 
A DOMAIN 2 > 

V K I S D H C C T P E G G C T D> 
.CODING REGION — > 

410 420 430 440 450 
ACTGGATCAAGGAGAACGTCTGCAAACAGACCAGATTCAACTTTTTCCGA 
TGACCTAGTTCCTCTTGCAGACGTTTGTCTGGTCTAAGTTGAAAAAGGCT 
DWIKENVCKQTRFNFFR> 

A DOMAIN 2 . > 

D W I K E N V C K Q T R F N F F R> 
CODING REGION _ > 



FIG.32B 
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460 470 480 490 500 
CAAT T TCTCGAT TCCCCTCAAAGATCATGTGG ACCCCAGT TG T TCAAAAG 
GTTAAAGAGCTAAGGGGAGTTTCTAGTACACCTGGGGTCAACAAGTTTTC 

0 F L> 
> 

DSPQRSCGPQLFKR> 

_B DOMAIN 3 > 

Q F L D S P Q R S C G P Q L F K R> 
COOING REGION. > 



510 520 530 540 550 
AG TG AATACT TTGTG T AATGAAAATATCAATG TTGAAAAT AATGTAAGCG 
TCACTTATGAAACACATTACTTTTATAGTTACAACTTTTATTACATTCGC 

V N T L C N E> 
B DOMAIN 2 > 

N I N V E N N V S> 

A DOMAIN 3 

VNTLCNEN INVENNVS> 
CODING REGION— 



560 570 580 590 600 
TGTCGAAAAGCTGTTGCGAATCAGCGGCAGGATGCACGGATGATTGGATT 
ACAGCTTTTCGACAACGCTTAGTCGCCGTCCTACGTGCCTACTAACCTAA 

V S K S C C E S A A G C T D D W ]> 

_A DOMAIN 3_ > 

V S K S C C E S A A G C T D D W I> 

CODING REGION > 

610 620 630 640 650 
AAGAAGAATGTCTGCACACAGCATAAGCCTTTTGTTTTCCGTCCAGGCTT 
TTCTTCTTACAGACGTGTGTCGTATTCGGAAAACAAAAGGCAGGTCCGAA 
K K N V C T Q H K P F V F R P G F> 

. A DOMAIN 3 > 

K K N V C T Q H K P F V F R P G F> 
CODING REGION > 

TTACTGA 
AATGACT 

Y> 
> 

Y *> 



FIG.32C 
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T10D4.N2 

10 20 30 40 50 

ATGATTTTCTATCTGACAACCTACCTAGTAACTATGTCACCTCTCTTCCT 
TACTAAAAGATAGACTGTTGGATGGATCATTGATACAGTGGAGAGAAGGA 
M I FYLTTYLVTMSPLF L> 

SIGNAL PEPTIDE > 

MIFYLTTYLVTMSPLFL> 

CODING REGION > 



60 70 80 90 100 

GATCCTGTTGCTTCTAGTCTCTACCACTTACCCTTACATCATTGACTCTT 
CTAGGACAACGAAGATCAGAGATGGTGAATGGGAATGTAGTAACTGAGAA 
I L L L L V S T T Y P> 

SIGNAL PEPTIDE > 

ILLLLVSTTYPYI IDS> 

CODING REGION > 

Y I I D S> 
B DOMAIN > 

110 120 130 140 150 

CGGAGAGTTATGAAGTTCTAATGCTATTCGGGTATAAGAGAACATGTGGA 
GCCTCTCAAT AC TTCAAGAT T ACG AT AAGCCCATAT TCTCT TG TACACCT 
SESYEVLMLFGYKRTCO 

CODING REGION -> 

SESYEVLMLFGYKRTCO 

__B DOMAIN ■ > 

160 170 180 190 200 

CGACGCTTGATGAACAGGATTAATAGAGTATGCGTGAAGGATATAGATCC 
GCTGCGAACTACTTGTCCTAATTATCTCATACGCACTTCCTATATCTAGG 
RRLMNR I NRVCVKD I DP> 

CODING REGION .> 

RRLMNR I NRVCVKD I D> 

B DOMAIN — > 

P> 
> 



FIG.33A 
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210 220 230 240 250 

AGCAGATATCGATCCGAAGATCAAATTATCGGAGCACTGTTGTATCAAGG 
TCGTCTATAGCTAGGCTTCTAGTTTAATAGCCTCGTGACAACATAGTTCC 
A D I D P K I K L S E H C C I K> 

CODING REGION > 

A D I D P K I K L S E H C C I K> 
A DOMAIN. > 



260 270 280 290 300 

~ . -r~^ i ~ . -r^ A T^rs A T^A A^A A A T A T T TP P AP Tfl APP A Afl T TP TP 

b A 1 bLALAbA I bbA ( bbH I UrtMonnoun mm uwow » w. >wu. » . . ~ . w 

CTACGTGTCTACCTACCTAGTTCTTCGTATAAACGTCACTCCTTCAAGAC 
GCTDGWIKKH 1CSEEVL> 

CODING REGION > 

GCTDGWIKKH ICSEEVL> 

A DOMAIN > 

310 320 
AATTTTGGATTTTTTGAAAATTGA 
TTAAAACCTAAAAAACTTTTAACT 
NFGFFEN*> 

CODING REGION > 

N F G F F E N> 
A DOMAIN > 



FIG.33B 
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Y522A1.N 

10 20 30 40 50 60 

ATGCAAAGCC TACC AAT TCT TGCCTGCCTCCTCACACTG TCAG T T TT TGCGCCGG AAATT 
TACGTTTCGGATGGTTAAGAACGGACGGAGGAGTGTGACAGTCAAAAACGCGGCCTTTAA 
MQSLP I LACLLTLSVFAPE I> 

SIGNAL PEPTIDE . > 

MQSLP I LACLLTLSVFAPE I> 

CODING REGION > 



70 80 90 100 110 120 

CATGGCCGGGAGCTCAAACGTTGTTCTGTGAAACTTTTTGATATTCTAAGCGTAATTTGT 
GTACCGGCCCTCGAGTTTGCAACAAGACACTTTGAAAAACTATAAGATTCGCATTAAACA 

H G> 



> 



HGRELKRCSVKLFDILSVIO 
CODING REGION . > 



R E L K R C S V K L F D I L S V I C> 



U L/V-rt¥tf ^ 111' 



130 140 150 160 170 180 

GGAACTGAAAGTGATGCAGAAATTCTACAAAAAGTCGCAGTGAAATGCTGCCAGGAGCAG 
CCTTGACTTTCACTACGTCTTTAAGATGTTTTTCAGCGTCACTTTACGACGGTCCTCGTC 

GTESDAE I LQKVAVKCCQEQ> 
CODING REGION . > 

G T E S D A E> 
B DOMAIN > 

ILQKVAVKCCQEO 
A DOMAIN > 

190 200 210 220 230 

TGTGGGTTTGAGGAAATGTGCCAGCATGCCAACTTGAAAATCGACAAAATTTAA 
ACACCCAAACTCCTTTACACGGTCGTACGGTTGAACTTTTAGCTGTTTTAAATT 
CGFEEMCQHANLKIDK I *> 

-CODING REGION > 

CGFEEMCQHANLKIDK I> 

_A DOMAIN > 
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SEQUENCE LISTING 
<110> EXELIXIS PHARMACEUTICALS, INC. 

<12 0> NUCLEIC ACIDS AND PROTEINS OF C. ELEGANS INSULIN- LIKE 
GENES AND USES THEREOF 

<130> 7326-098-228 

<140> PCT/US 9 9/ 
<141> 1999-04-15 

<150> 09/062,580 
<151> 1998-04-17 

<150> 09/074,984 
<151> 1998-05-08 

<150> 09/084,303 
<151> 1998-05-26 

<160> 215 

<170> Patentln Ver. 2.0 

<210> 1 
<211> 109 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 1 

Met Tyr Trp Phe Arg Gin Val Tyr Arg Pro Ser Phe Phe Phe Gly Phe 
15 10 15 

Leu Ala lie Leu Leu Leu Ser Ser Pro Thr Pro Ser Asp Ala Ser lie 
20 25 30 

Arg Leu Cys Gly Ser Arg Leu Thr Thr Thr Leu Leu Ala Val Cys Arg 
35 40 45 

Asn Gin Leu Cys Thr Gly Leu Thr Ala Phe Lys Arg Ser Ala Asp Gin 
50 55 60 

Ser Tyr Ala Pro Thr Thr Arg Asp Leu Phe His lie His His Gin Gin 
65 70 75 80 

Lys Arg Gly Gly lie Ala Thr Glu Cys Cys Glu Lys Arg Cys Ser Phe 

85 90 95 
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Ala Tyr Leu Lys Thr Phe Cys Cys Asn Gin Asp Asp Asn 
100 105 



<210> 2 
<211> 91 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 2 

Met Ser Ser Tyr Arg Gin Thr Leu Phe lie Leu lie lie Leu lie Val 

15 10 15 

He He Leu Phe Val Asn Glu Gly Gin Gly Ala Pro His His Asp Lys 
20 25 30 

Arg His Thr Ala Cys Val Leu Lys He Phe Lys Ala Leu Asn Val Met 
35 40 45 

Cys Asn His Glu Gly Asp Ala Asp Val Leu Arg Arg Thr Ala Ser Asp 
50 55 60 

Cys Cys Arg Glu Ser Cys Ser Leu Thr Glu Met Leu Ala Ser Cys Thr 
65 70 75 80 

Leu Thr Ser Ser Glu Glu Ser Thr Arg Asp He 
85 90 



<210> 3 
<211> 106 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 3 

Met Phe Ser Phe Phe Thr Tyr Phe Leu Leu Ser Ala Leu Leu Leu Ser 
15 10 15 

Ala Ser Cys Arg Gin Pro Ser Met Asp Thr Ser Lys Ala Asp Arg He 
20 25 30 

Leu Arg Glu He Glu Met Glu Thr Glu Leu Glu Asn Gin Leu Ser Arg 
35 40 45 

Ala Arg Arg Val Pro Ala Gly Glu Val Arg Ala Cys Gly Arg Arg Leu 
50 55 60 
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Leu Leu Phe Val Trp Ser Thr Cys Gly Glu Pro Cys Thr Pro Gin Glu 

65 70 75 80 

Asp Met Asp lie Ala Thr Val Cys Cys Thr Thr Gin Cys Thr Pro Ser 
85 90 95 

Tyr lie Lys Gin Ala Cys Cys Pro Glu Lys 
100 105 



<210> 4 
<21i> 106 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 4 

Met Asn Ala lie He Phe Cys Leu Leu Phe Thr Thr Val Thr Ala Thr 
15 10 15 

Tyr Glu Val Phe Gly Lys Gly He Glu His Arg Asn Glu His Leu He 
20 25 30 

He Asn Gin Leu Asp He He Pro Val Glu Scr Thr Pro Thr Pro Asn 
35 40 45 

Arg Ala Ser Arg Val Gin Lys Arg Leu Cys Gly Arg Arg Leu He Leu 
50 55 60 

Phe Met Leu Ala Thr Cys Gly Glu Cys Asp Thr Asp Ser Ser Glu Asp 
65 70 75 80 

Leu Ser His He Cys Cys He Lys Gin Cys Asp Val Gin Asp He He 
85 90 95 

Arg Val Cys Cys Pro Asn Ser Phe Arg Lys 
100 105 



<210> 5 
<211> 107 
<212> PRT 

<2I3> Caenorhabditis elegans 
<400> 5 

Met Lys Leu Ser Val Val Leu Ala Leu Phe He He Phe Gin Leu Gly 
15 10 15 

Ala Ala Ser Leu Met Arg Asn Trp Met Phe Asp Phe Glu Lys Glu Leu 
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20 25 30 

Glu His Asp Tyr Asp Asp Ser Glu He Gly Phe His Asn He His Ser 
35 40 45 

Leu Met Ala Arg Ser Arg Arg Gly Asp Lys Val Lys He Cys Gly Thr 
50 55 60 

Lys Val Leu Lys Met Val Met Val Met Cys Gly Gly Glu Cys Ser Ser 
65 70 75 80 

Thr Asn Glu Asn He Ala Thr Glu Cys Cys Glu Lys Met Cys Thr Met 
85 90 95 

Glu Asp He Thr Thr Lys Cys Cys Pro Ser Arg 
100 105 



<210> 6 
<211> 112 
<212> PRT 

<213> Caenorhabdit is elegans 
<400> 6 

Met Asn Ser Val Phe Thr He He Phe Val Leu Cys Ala Leu Gin Val 
15 10 15 

Ala Ala Ser Phe Arg Gin Ser Phe Gly Pro Ser Met Ser Glu Glu Ser 
20 25 30 

Ala Ser Met Gin Leu Leu Arg Glu Leu Gin His Asn Met Met Glu Ser 
35 40 45 

Ala His Arg Pro Met Pro Arg Ala Arg Arg Val Pro Ala Pro Gly Glu 
50 55 60 

Thr Arg Ala Cys Gly Arg Lys Leu He Ser Leu Val Met Ala Val Cys 
65 70 75 80 

Gly Asp Leu Cys Asn Pro Gin Glu Gly Lys Asp He Ala Thr Glu Cys 
85 90 95 

Cys Gly Asn Gin Cys Ser Asp Asp Tyr He Arg Ser Ala Cys Cys Pro 
100 105 110 
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<210> 7 
<211> 100 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 7 

Met His Ser lie Val Ala Leu Met Leu He Gly Thr He Leu Pro He 
15 10 15 

Ala Ala Leu His Gin Lys His Gin Gly Phe He Leu Ser Ser Ser Asp 
20 25 30 

Ser Thr Gly Asn Gin Pro Met Asp Ala He Ser Arg Ala Asp Arg His 
35 40 45 

Thr Asn Tyr Arg Ser Cys Ala Leu Arg Leu He Pro His Val Trp Ser 
50 55 60 

Val Cys Gly Asp Ala Cys Gin Pre Gin Asn Gly He Asp Val Ala Gin 
65 70 75 60 

Lys Cys Cys Ser Thr Asp Cys Ser Ser Asp Tyr He Lys Glu He Cys 
fi5 90 95 

Cys Pro Phe Asp 
100 



<210> 8 
<211> 105 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 8 

Met Pro Pro He He Leu Val Phe Phe Leu Val Leu He Pro Ala Ser 
15 10 15 

Gin Gin Tyr Pro Phe Ser Leu Glu Ser Leu Asn Asp Gin He He Asn 
20 25 30 

Glu Glu Val lie Glu Tyr Met Leu Glu Asn Ser He Arg Ser Ser Arg 
35 40 45 

Thr Arg Arg Val Pro Asp Glu Lys Lys He Tyr Arg Cys Gly Arg Arg 
50 55 60 

He His Ser Tyr Val Phe Ala Val Cys Gly Lys Ala Cys Glu Ser Asn 
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65 70 75 80 

Thr Glu Val Asn lie Ala Ser Lys Cys Cys Arg Glu Glu Cys Thr Asp 
85 90 95 

Asp Phe lie Arg Lys Gin Cys Cys Pro 
100 105 



<210> 9 
<211> 104 
<212> PRT 

<213> Caeiiorhabdit i s elegans 
<400> 9 

Met Ser Pro He He Leu He Phe Phe Leu Val Phe He Pro Phe Ser 
15 10 15 

Gin Gin His Thr Ser Leu Glu Glu Ser Leu Asn Asp Arg He He Ser 
20 25 30 

Glu Glu Val Val Glu Met Leu Ser Glu Lys Glu He Arg Pro Ser Arg 
35 40 45 

Val Arg Arg Val Pro Glu Gin Lys Asn Lys Leu Cys Gly Lys Gin Val 
50 55 60 

Leu Ser Tyr Val Met Ala Leu Cys Glu Lys Ala Cys Asp Ser Asn Thr 
65 70 75 80 

Lys Val Asp He Ala Thr Lys Cys Cys Arg Asp Ala Cys Ser Asp Glu 
85 90 ■ 95 

Phe He Arg His Gin Cys Cys Pro 
100 



<210> 10 
<211> 118 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 10 

Met He Val Thr Leu He Val Phe Leu Val He Gly Leu Gin Met Ala 
15 10 15 

His Leu Ser Gin Val Ser Gly Asn Asn Glu Asn Gly Phe Leu Asn Pro 
20 25 30 
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Phe Asp Leu Ser Gin Trp Ser Glu Glu lie Leu His Arg Gin Tyr His 
35 40 45 

His His His His His His His Gly Asn Arg Ala Arg Arg Thr Leu Glu 
50 55 60 

Thr Glu Lys lie Tyr Arg Cys Gly Arg Lys Leu Tyr Thr Asp Val Leu 
65 70 75 80 

Ser Ala Cys Asn Gly Pro Cys Glu Pro Gly Thr Glu Gin Asp Leu Ser 
85 90 95 

Lys Leu Cys Cys Gly Asn Gin Cys Thr Phe Val Glu lie Arg Lys Ala 
100 105 110 

Cys Cys Ala Asp Lys Leu 
115 



<210> 11 
<211> 86 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 11 

Met Gin Ser Asn lie Thr Ala Ser Leu Phe lie Ala Leu Leu lie Phe 
15 10 15 

Gly Val lie Ser Ala Ala Pro Ser His Glu Lys Thr His Lys Lys Cys 
20 25 30 

Ser Asp Lys Leu Tyr Leu Ala Met Lys Ser Leu Cys Ser Tyr Arg Gly 
35 40 45 

Tyr Ser Glu Phe Leu Arg Asn Ser Ala Thr Lys Cys Cys Gin Asp Asn 
50 55 60 

Cys Glu lie Ser Glu Met Met Ala Leu Cys Val Val Ala Pro Asn Phe 
65 70 75 80 

Asp Asp Asp Leu Leu His 
85 



<210> 12 
<211> 76 
<212> PRT 



7 
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<213> Caenorhabditis elegans 
<400> 12 

Met Lys Thr Tyr Ser Phe Phe Val Leu Phe He Val Phe He Phe Phe 
15 10 15 

He Ser Ser Ser Lys Ser His Ser Lys Lys His Val Arg Phe Leu Cys 
2 0 2 5 3 0 

Ala Thr Lys Ala Val Lys His He Arg Lys Val Cys Pro Asp Met Cys 
35 40 45 

Lou Thr Gly Giu Giu Val Glu Val Asn Glu Phe Cys Lys Met Gly Tyr 
50 55 60 

Ser Asp Ser Gin He Lys Tyr He Cys Cys Pro Glu 
65 70 75 



<210> 13 
<211> 83 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 13 

Met His Thr Thr Thr He Leu He Cys Phe Phe lie Phe Leu Val Gin 
15 10 15 

Val Ser Thr Met Asp Ala His Thr Asp Lys Tyr Val Arg Thr Leu Cys 
20 25 30 

Gly Lys Thr Ala He Arg Asn He Ala Asn Leu Cys Pro Pro Lys Pro 
35 40 45 

Glu Met Lys Gly lie Cys Ser Thr Gly Glu Tyr Pro Ser He Thr Glu 
50 55 60 

Tyr Cys Ser Met Gly Phe Ser Asp Ser Gin He Lys Phe Met Cys Cys 
65 70 75 80 

Asp Asn Gin 



<210> 14 
<211> 76 
<212> PRT 

<213> Caenorhabditis elegans 
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<400> 14 

Met Phe Val Leu Leu lie He Leu Ser He He Leu Ala Gin Val Thr 
15 10 15 

Asp Ala His Ser Glu Leu His Val Arg Arg Val Cys Gly Thr Ala He 
20 25 30 

He Lys Asn He Met Arg Leu Cys Pro Gly Val Pro Ala Cys Glu Asn 
35 40 45 

Gj-y Glu Val Pro Ser Pro Thr Glu Tyr Cys Ser Met Gly Tyr Ser Asp 
50 55 60 

Ser Gin Val Lys Tyr Leu Cys Cys Pro Thr Ser Gin 
65 70 75 



<210> 15 
<211> 80 
<212> PRT 

<213> Caenorhabdi t is elegans 
^400^ 15 

Met Asp Lys Pro Ser Tyr Leu Ser Ser Lys Glu Ala Trp Lys Met Leu 
15 10 15 

Asn Glu Leu Leu Lys Glu Pro Lys His His His His His His Arg His 
20 25 30 

Lys Gly Tyr Cys Gly Val Lys Ala Val Lys Lys Leu Lys Gin He Cys 
35 40 45 

Pro Asp Leu Cys Ser Asn Val Asp Asp Asn Leu Leu Met Glu Met Cys 
50 55 60 

Ser Lys Asn Leu Thr Asp Asp Asp lie Leu Gin Arg Cys Cys Pro Glu 
65 70 75 80 



<210> 16 

<211> 108 

<212> PRT 

<213> Caenorhabditis elegans 
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<400> 16 
Met Phe Ser Thr 
1 

Val Ala Ala Phe 
20 

Thr He Arg Pro 
35 

Pro Gly Gly Ala 
50 

Arg Arg Arg Arg 
65 

Ser Leu Leu Gly 



Phe Thr Asp He 
100 



Arg Gly Val Leu 
5 

Gly Leu Phe Ser 

Pro Arg Ala Lys 
40 

Ser Phe Leu Asp 
55 

Arg Ser Val Ser 

70 

Arg Thr Met Asn 
85 

Phe Ala He Cys 



Leu Leu Leu Ser 
10 

Arg Pro Ala Pro 
25 

His Gly Ser Leu 



Ala Phe Asn Leu 
60 

Glu Asn Tyr Asn 
75 

Met Cys Cys Glu 
90 

Asn Pro Phe Gly 
105 



Leu Met Ala Ala 
15 

He Thr Arg Asp 
30 

Lys Leu Cys Pro 
45 

He Cys Pro Met 



Asp Gly Gly Gly 
80 

Thr Gly Cys Glu 

95 



<211> 95 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 17 

Met Val His Arg Leu Phe He Val Leu He Ala He He Leu Val Ala 
15 10 15 

Lys Ser Thr Ala lie Ser Leu Gin Gin Ala Asp Gly Arg Met Lys Met 
20 25 30 

Cys Pro Pro Gly Gly Ser Thr Phe Thr Met Ala Trp Ser Met Ser Cys 
35 40 45 

Ser Met Arg Arg Arg Lys Arg Asp Val Gly Arg Tyr Phe Glu Lys Arg 
50 55 60 

Ala Leu He Ala Pro Ser lie Arg Gin Leu Gin Thr He Cys Cys Gin 
65 70 75 80 

Val Gly Cys Asn Val Glu Asp Leu Leu Ala Tyr Cys Ala Pro He 
85 90 95 
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<210> 18 
<211> 83 
<212> PRT 

<213> Caenorhabdit is elegans 
<400> 18 

Met Lys Phe Phe Arg Leu lie Leu Leu Cys Ala Leu Val Leu Thr Thr 
15 10 15 

Met Ala Phe Leu Ala Pro Ser Thr Ala Ala Lys Arg Arg Cys Gly Arg 
20 25 30 

Arg Leu lie Pro Tyr Val Tyr Ser lie Cys Gly Gly Pro Cys Glu Asn 
35 40 45 

Gly Asp He He He Glu His Cys Phe Ser Gly Thr Thr Pro Thr He 
50 55 60 

Ala Glu Val Gin Lys Ala Cys Cys Pro Glu Leu Ser Glu Asp Pro Thr 
65 70 75 80 

Phe Ser Ser 



<210> 19 
<211> 321 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 19 

atgttttcat tctttacata tttccttctc tccgcacttc ttctctccgc ttcatgtcga 60 
caaccttcca tggacaccag caaagccgat cgtattctac gagagatcga aatggaaaca 120 
gaactcgaaa atcaactctc ccgagcacga cgagtcccag ctggagaggt tcgtgcctgt 180 
ggaagacgac ttcttctctt tgtctggtca acctgtggag aaccatgcac gccacaagag 240 
gacatggaca ttgccacagt ttgctgcaca acacagtgca ctccatcata tataaaacaa 300 
gcttgctgcc cagaaaagta a 321 

<210> 20 
<211> 321 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 20 

atgttttcat tctttacata tttccttctc tccgcacttc ttctctccgc ttcatgtcga 60 
caaccttcca tggacaccag caaagccgat cgtattctac gagagatcga aatggaaaca 120 
gaactcgaaa atcaactctc ccgagcacga cgagtcccag ctggagaggt tcgtgcctgt 180 
ggaagacgac ttcttctctt tgtctggtca acctgtggag aaccatgcac gccacaagag 240 
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gacatggaca ttgccacagt ttgctgcaca acacagtgca ctccatcata tataaaacaa 300 
gcttgctgcc cagaaaagta a 321 

<210> 21 
<211> 321 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 21 

atgaacgcta taatcttctg tctcctcttc acaactgtca ctgccactta tgaagttttc 60 
ggaaaaggaa tagaacacag aaatgaacat ttgatcatca atcaacttga tatcatacca 120 
gttgagtcaa ctccaactcc aaaccgtgcc tcaagagtcc agaaacgtct atgcggaaga 180 
cgtcttattt tattcatgct tgcaacatgt ggagaatgtg atacagattc atcagaagac 240 
crttcgcata tttgctgcat aaaacaatgt gacgttcaag atatcatcag agtctgctgc 300 
ccgaattcat ttagaaaata g 321 

<210> 22 

<211> 324 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 22 














atgaaactct 


ccgttgttct 


tgcacttttc 


attattttcc 


aacttggagc 


tgcaagtctt 


6 0 


atgcgtaact 


qqatqttcqa 


ttttqaqaaa 


gaattgqaac 


acgattatga 


tgattcggaa 


120 


attggattcc 


ataacattca 


ctccctgatg 


gccagatcaa 


gaagaggaga 


caaagtgaag 


180 


atttgtggta 


caaaagttct 


gaaaatggtg 


atggtaatgt 


gtggaggaga 


atgt tcatca 


240 


acgaatgaga 


acatcgctac 


agaatgctgt 


gaaaaaatgt 


gcacaatgga 


agatataact 


300 


actaagtgct 


gcccttcaag 


atga 








324 



<210> 23 

<211> 339 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 23 














atgaactctg 


tctttactat 


cancttcgtt 


ttgtgcgcac 


tccaagtcgc 


tgcaagtttc 


60 


cgtcaatcct 


tcggtccttc 


aatgtctgaa 


gaatcagcaa 


gcatgcaact 


tctccgtgaa 


120 


cttcaacaca 


acatgatgga 


atcagctcac 


cgaccaatgc 


cacgagcaag 


acgtgttcca 


180 


gcaccaggag 


aaactcgtgc 


ctgcggaaga 


aaactcatct 


ctttagtcat 


ggctgtctgt 


240 


ggagatcttt 


gcaacccaca 


agaaggaaag 


gacattgcga 


ctgaatgctg 


cggaaatcag 


300 


tgttctgatg 


actacataag 


atctgcttgt 


tgtccatga 






339 



<210> 24 

<211> 303 

<212> DNA 

<213> Caenorhabditis elegans 

<400> 24 
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atgcactcga tcgtcgcctt gacgctcatc 
cagaagcatc aaggcttcat cctgtcgtca 
gcgatctcaa gagccgaccg tcacaccaac 
catgtctggt cggtgtgcgg tgacgcctgc 
aaatgttgct ccactgattg cagctccgat 
taa 



ggaacaattc tcccaatcgc tgctcttcac 50 
tccgattcaa ccggaaacca accaatggat 120 
taccgatcat gcgcattgcg gctcatcccg 180 
caaccacaaa acggaatcga tgtcgctcaa 240 
tacatcaaag aaatctgctg cccatttgac 300 

303 



<210> 25 

<211> 318 

<212> DNA 

<213> Caenorhabdit is elegans 



<400> 25 

atgccaccaa taattttggt tttctttttg 
ttttcactgg agtccttaaa tgatcaaata 
gaaaattcaa ttaggtccag cagaaccaga 
tgtggaagaa gaatacattc gtatgtgttt 
actgaagtta atactgcatc aaaatgttgc 
aaacagtgct gtccttaa 



gttttaatcc ctgcttctca acaatatcct 60 
atcaatgaag aagtaatcga atatatgctt 120 
agagtccctg acgagaaaaa aatttatcgt 180 
gcggtttgtg gaaaagcatg cgaatcgaat 240 
cgtgaagaat gcaccgacga cttcattcga 300 

318 



<210> 26 

<211> 315 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 26 

atgtcgccaa tcattttgat tttctttttg 
tctttagagg agtccttaaa tgatcgaata 
gagaaagaaa ttagacccag cagagtaaga 
ggaaagcaag tcttatccta cgttatggca 
aaagtcgata ttgcgacaaa atgttgccgc 
caatgttgtc cttaa 



gttttcattc cgttttctca acaacacaca 60 
atcagtgaag aagtagtcga aatgctatca 120 
agagtccctg aacaaaaaaa taaattgtgc 180 
ctttgtgaaa aagcatgcga ttcaaataca 240 
gatgcatgct cagacgaatt cattcgacat 300 

315 



<210> 27 

<211> 357 

<212> DNA 

<213> Caenorhabditis elegans 



<4O0> 27 

atgatcgtca ctttgattgt ctttcttgtc 
gtatctggaa acaacgaaaa tggattctta 
gaaatcctcc accgtcagta tcatcatcac 
agaaccttgg aaaccgaaaa aatctaccgc 
tcagcgtgca acgggccatg tgaaccgggt 
ggaaaccaat gtactttcgt tgaaatcagg 



attggacttc aaatggcaca cctttctcaa 60 

aatccatttg atttgtctca atggagcgaa 120 

caccaccatc accatggaaa tcgggcgaga 180 

tgtggaagaa aactctacac tgatgtgcta 240 

acggaacagg atctctctaa gctgtgctgt 300 

aaagcatgct gtgccgacaa attgtaa 357 



<210> 28 
<211> 276 
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<212> DNA 

<213> Caenorhabditis elegans 



<400> 28 

acgtctagtt accgtcaaac attgttcatt cttattattc ttattgtaat tattctcttc 60 
gtcaatgagg gtcaaggagc gcctcaccat gacaaacggc acactgcatg cgtcctaaag 120 
attttcaagg cgctaaacgt tatgtgtaat catgaaggtg atgcagatgt tctgaggaga 180 
acagcatccg actgctgtcg ggagagctgc tcgctaacag aaatgttagc gagctgcacc 24C 
cccaccagct cagaagagtc aactcgggac atttaa 276 

<210> 2 9 
<211> 261 
<2 12> DNA 

<213> Caenorhabditis elegans 
<400> 29 

atgcaatcaa acatcaccgc ttcattattc atagcgttgc ttatatttgg agcaatcagt 60 
gcagctccat ctcatgaaaa aacacacaaa aaatgctctg ataaattata tttggcgatg 120 
aagtcgttgt gtagtnatcg aggttatagt gaattcttaa gaaattctgc aactaagtgt 180 
tgccaagaca attgtgagat ttcggaaatg atggcgttgt gtgttgttgc tcccaatttt 240 
gacgacgatc icct:catta a 261 

<210> 30 
<2 1 " > 2 31 
<212> DNA 

<213> Caenorhabditis elegans 



<400> 30 

atgaaaacct actcattttt cgtgcttttt 

aaatctcatt caaagaaaca tgttcgtttc 

cggaaagtat gccctgatat gtgtctcact 

aagatggggt actcggattc tcaaatcaag 

<210> 31 

< 2 1 1 > 252 

<212> DNA 

<213> Caenorhabditis elegans 



attgtattca tcttttttat ttcttcatca 60 

ctttgtgcaa caaaagcggt caaacacatt 120 

ggagaagaag tcgaagtcaa tgagttttgc 180 

tacatttgct gtcccgaata a 231 



<400> 31 

atgcacacta caactattct catatgcttt 
gatgctcaca ctgacaaata cgtcagaact 
gccaaccttt gcccgccaaa gccagaaatg 
agcatcaccg aatactgttc catgggattt 
gataaccaat ga 



ttcatctttc ttgttcaagt ctccacaatg 60 
ctgtgtggaa aaactgcaat cagaaatatt 120 
aagggtatct gttctaccgg agagtatcca 180 
tcagactctc agatcaagtt tatgtgctgt 240 

252 



<210> 32 
<211> 231 
<212> DNA 
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<213> Caenorhabditis elegans 



<400> 32 

atgttcgttc ttcttattat tctctctatc 
gagcttcacg ttcgtagggt gtgcggaact 
ccaggggtac cggcttgcga aaatggagaa 
gggtactcag acagccaggt aaaataccta 



attctggctc aagtcactga tgctcattca 60 

gctatcataa agaacataat gcgattgtgc 120 

gttccaagtc caaccgagta crgttcaatg 180 

tgctgtccaa cttctcagtg a 231 



<210> 33 

<211> 243 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 33 

atggacaaac catcctacct gtcatccaaa 
aaagagccga aacatcatca tcatcatcac 
gtaaagaaat taaaacaaat ctgtccagat 
acggaaatgt gctcaaaaaa cctgacggat 
tga 



gaagcatgga aaatgctaaa tgagctgctg 60 
aggcacaaag gatattgtgg agttaaagct 120 
ctttgctcga atgttgatga taaccttctc 180 
gatgatattt tgcaacggtg crgtccagaa 240 

24 3 



<210> 34 
<2 11> 327 
<212> DNA 



<400> 34 

atgttctcga ccagaggggt actcctttta ctgtctttga tggctgctgt agccgcattc 60 

gqgctgtttt ctagaccggc tccaatcact cgggacacta tccgaccacc acgtgccaaa 120 

cacggttcgc tgaaattatg cccaccaggt ggtgcctcat cccttgacgc tttcaacttg 180 

atttgcccaa tgcgccgtcg acgcaggagt gtttcagaaa actacaacga cggcggtggc 240 

agccttttgg gacggacaat gaatatgtgc tgtgagacgg gatgtgaatt cactgacatt 300 

ttcgcaatct gcaatccttt tggataa 327 



<210> 35 

<211> 288 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 35 

atggtccacc gacttttcat cgtccttatt 
atctcacttc aacaagctga cggacgcatg 
acaatggcat ggtcaatgtc gtgttcgatg 
ttcgaaaaac gtgctctgat cgccccatca 
gttggttgca acgtggaaga tcttcttgcc 



gcaattattc ttgtcgcaaa atcaactgca 60 

aaaatgtgcc caccaggtgg ttcaacattc 120 

cgcaggagaa aacgagatgt tggacgatat 180 

atccgtcaac ttcaaacaat ttgctgtcaa 240 

tactgtgccc caatttaa 288 



<210> 36 
<211> 252 
<:112> DNA 
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<213> Caenorhabditis elegans 
<400> 36 

acgaaattct tccgcttaat cttgctctgc gcccttgtcc tgaccaccat ggcttttttg 60 

gctccaagta cggcagccaa gaggcgttgt ggccgccgct taattcccta tgtctattca 120 

atatgcggcg gcccgtgcga gaacggagat attatcatcg agcactgctt ctccggaaca 180 

actcccacca ttgccgaagt ccaaaaggct tgctgtcctg aactatctga agacccaact 240 
tcctcatctt aa 252 

<210> 37 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 37 

gacggagatg gcttgttgga cgac 24 

<210> 38 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 38 

ggtttaatta cccaagtttg ag 22 

<210> 39 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 39 

caagagaatg ttttcattct ttac 24 

<210> 40 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 40 

ttacttttct gggcagcaag cttg 24 

<210> 41 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 41 

ctaccatgaa cgctataatc ttct 24 
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<210> 42 
<211> 24 
<2 12> DNA 

<213> Caenorhabditis elegans 
<400> 42 

atgatagtac gatatgtcca taac 24 

<210> 43 
<211> 25 
<212> DNA 

<213> Caenorhabditis elegans 
<4C0> 43 

cctattttcc agccacagca ctctc 25 

<2 10> 4 4 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
< 4 0 0 > 44 

rr.rrgtFirtr a 1 1 t 1 r. ng 1 t atrn 24 

<210> 45 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<4 00> 4 5 

gtatggt aca gagactgata tcgg 24 

<210> 46 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 46 

caaggaaaat gcactcgatc gtcg 24 

<210> 47 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 47 

cccaagcttt gttatttaat gatgtggaga tgg 33 
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<210> 48 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 48 

gctctagaat ggtaaataca gaacattggt tc 32 

<210> 49 
<211> 32 
<212> DMA 

<2I3> Caenorhabditis elegans 
<400> 49 

gctctagaat gacggtaggt gtgtagatga ac 32 

<210> 50 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 50 



<210> 51 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 51 

gatagaagaa attaaggaca gcac 

<210> 52 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 52 

gtaaacgatt agattaagga caac 

<210> 53 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 



<400> 53 

gagqagtgaa acgatgatcg tcac 24 
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<210> 54 
<211> 24 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 54 

atccaattga gaagacgatt gttg 24 

<21J> 55 
<211> 33 
<212> DNA 

<213> Caenorhabdit is elegans 



<4 00> 5 5 

cccaagcttt tgaaccatga aaacctactc att 33 

< 2 1 0 > 5 b 

<211> 3 2 

<212> DNA 

<213> Caenorhabdit is elegans 



<400> 56 



yLLL Lay a^jU l, u l, k_ u l, l_ i_ a. o ^CgggaCSgC au 



<210> 5^ 
<21I> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<40C> 5^ 

cccaagcttg gatttctgga atttcgataa tg 

<21C> 58 
<211> 31 
<212> DNA 

<212> Caenorhabditis elegans 
<400> 58 

gctctagagc agcatagaat ggcggaagat c 

<2 10> 5 9 
<211> 33 
<212> DIIA 

<213> Caenorhabditis elegans 



32 



31 



<400> 59 

cccaagcttg tgtaggaatc gttaaatatg tct 33 
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<210> 60 
<211> 32 
<212> DNA 

<213> Caenorhabdi t is elegans 
<400> 60 

gctctagaga gatcatatta :attacacga ac 32 

<210> 61 
<211> 33 
<212> DNA 

<213> Caenorhabdi ti s elegans 
< 4 0 0 > 61 

cccaagcttc cgctctcaac aacgggccac acg 33 

<21C> 62 

<211> 32 

<2 12 > DNA 

<213> Caenorhabdi t is elegans 

<400> 62 



<210> 63 
<211> 31 
<212> DNA 

<213> Caenorhabdi t is elegans 
<400> 63 

cccaagcttg gtttaattac ccaagtttga g 31 

<210> 64 
<211> 32 
<212> DNA 

<213> Caenorhabdi tis elegans 
<400> 64 

gctctagatg atgcgtattt tgtgggcggt ac 32 

<210> 65 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 65 

gctctagact catzcagtitga aaaugaattt aag 33 
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<210> 66 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 66 

cccaagcttg gcataagcga gtatctgtga tec 33 

<210> 67 
<211> 3 3 
<2 12> DNA 

<213> Caenorhabditis elegans 
<400> 67 

ccgctcgagg taaagegagg gtaaagtaga teg 33 

<210> 68 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 68 

cccaagcttc taaccaacaa aaatgeacac tac 33 

<210> 69 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 69 

gctctagaca cgtgaacaat ctttatcttt at 32 

<210> 70 
<211> 34 
<2 12 > DNA 

<213> Caenorhabditis elegans 
<400> 70 

cccaagcttc acagecaaaa acaaaaatgc aatc 34 

<210> 71 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 71 

gctctagaca cagtatttta atgaaggaga tc 32 
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<210> 72 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 72 

ttgggcgcgc cgtcttgcat gcagttgtca eg 32 

<210> 73 
<211> 35 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 73 

ccaaccggta teattgegta ctgtcgtagc gtgtg 35 

<210> 74 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 74 

ttgggcgcgc ctgctaccgt gggaatttta caag -• ^ 

<210> 75 
<211> 35 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 75 

ccaaccggta tcatggtaga ttttagaatg gaaag 35 

<210> 76 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 76 

ttgggcgcgc eggagttcat ctggaggtca catc 34 

<210> 77 
<211> 39 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 77 

ccaaccggta tcattattca gaacaggaat tgataaatg 39 
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<210> 78 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 78 

ttgggcgcca gataaataca gaatgggcgg ag 32 

<210> 79 
<211> 36 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 79 

ccaaccggta tcattctctt ggagcttttg aaaaac 36 

<210> 80 

<2 11> 3 3 

<212> DNA 

<2i3> Caenorhabditis elegans 



33 



<400> 80 

t^gygcgcy c uciyL^gtcca acaagccatc tec 

<2 10> 81 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<4 0C> 81 

ccaaccggtt gcattttcct tgaagattga ag 32 

<210> 82 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 82 

ttgggcgcgc ctagattttc tccattcaca aac 33 

<210> 83 
<211> 34 
<2 12> DNA 

<213> Caenorhabditis elegans 
<400> 83 

ccaaccggta tcattataat gatatggata aegg 34 
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<210> 84 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 84 

ttgggcgcgc caatcgtttt catcattttg cttc 34 

<210> 85 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 85 

ccaaccggta tcatctggaa aagtaatatt atat 34 

<210> 86 
<2il> 34 
<2I2> DNA 

<213> Caenorhabditis elegans 
<400> 86 

ttgggcgcgc cLydddLcLL tatatcctct ccac ^4 

<210> 87 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 87 

ccaaccggta tcatctggaa ataattaata tcag 34 

<210> 88 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 88 

ttgggcgcgc ctaacacgtg cattggaggc ggag 34 

<210> 89 
<21I> 36 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 89 

ccaacggtat catcgtttca ctcctcgaat tatttg 36 
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<210> 90 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 90 

ttgggcgcgc cattggtatc acaaggatca age 33 

<210> 91 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 91 

ccaaccggca tttttgtttt tggctgtgat ta 32 

<21Q> 92 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 92 

ttgggcgcgc caattttgac gacgatcncc etc 33 

<210> 93 
<211> 37 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 93 

ccaaccggta tcatatttaa cgattcctac acaaacc 37 

<210> 94 
<211> 30 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 94 

ttgggcgcgc cgtgtggagg tggtgaatcc 30 

<210> 95 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 95 

cggggtaccc tcatttcaaa gaaatgttga ata 33 
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<210> 96 
< 2 1 1 > 34 
<2 12> DNA 

<213> Caenorhabdit i s elegans 
<400> 96 

ttgggcgcgc cggagccgaa caagaaaaac ctac 34 

<210> 97 
<211> 33 
<212> DNA 

<;213> Caenorhabdit is elegans 
<400> 97 

ccaaccggtt tcatggttca actcaaaaag gaa 33 

<110> 98 
<211> 34 
<212> DNA 

<212> Caenorhabdi tis elegans 
<400> 98 



<210> 99 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 99 

ccaaccggtt tcatggttca actcaaaaag gaa 33 

<210> 100 
< 2 1 1 > 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 100 

ttgggcgcgc catgggattt tcagactctc ag 32 

<210> 101 
<211> 35 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 101 

ccaaccggta acattatcaa aa:tccagaa atccc 35 
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<210> 102 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 102 

tt: 999 c 9 c 9 c cacttcggac agatgtgaca eg 32 

<210> 103 
<211> 35 
<212> DMA 

<213> Caenorhabditis elegans 
<400> 103 

eggggtaect gcattgtaaa agtgattttg aaaat 35 

<210> 104 

<211> 22 

<212> DNA 

<213> Caenorhabditis elegans 

<400> 104 



<210> 105 

<211> 22 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 105 

geataeggta ectatregtt tc 22 

<210> 106 
< 2 1 1 > 21 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 106 

agctcaaagg ccaaatgtgt g 21 

<210> 107 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 107 

aacaaaccct acagttactg eg 22 
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<210> 108 
<211> 22 
<212> DNA 

<213> Caenorhabdit i s elegans 
<400> 108 

gctatccacc tgtccaacct ac 22 

<210> 109 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 109 

ggaggctctt tactcgcctt ac 22 

<210> 110 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 110 

tacaggctgt ccttctgtta eg 22 

<210> 111 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 111 

tccactattc eggtaatace tc 22 

<210> 112 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 112 

gcaagaaatc gagagtcacg cc 22 

<210> 113 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 113 

ctgcctcaag gaggagttac ac 22 
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<210> 114 
<211> 22 
<212> DNA 

<213> Caenorhabditi s elegans 
<400> 114 

ctgcctcaag gaggagttac ac 22 

<210> 115 
<211> 22 
<2 12> DNA 

<213> Caenorhabditi s elegans 
<400> 115 

atttatcccc acgtgagaga gg 22 

<210> 116 
<211> 21 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 116 

cactgqatqa cagatttgat g 21 

<210> 117 
<211> 21 
<212> DNA 

<213> Caenorhabditis elegans 
<400^ 117 

tgatgagaca cgggtgaaac g 21 

<210> 118 
<211> 21 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 118 

gaacggataa aaaggcggag c 21 

<210> 119 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 119 

ttgatgtgac ctccagatga ac 22 
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<210> 120 
<211> 22 
<212> DNA 

<213> Caenorhabdi tis elegans 
<400> 120 

gcagcacact cttgttttca gc 22 

<210> 121 
<211> 20 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 121 

caaatcactc acttcctgcg 20 

<210> 122 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 

<400> 122 
tr.caaatatr: 



<210> 123 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 123 

gcatagaatg gcggaagatc ac 22 

<210> 124 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 124 

cttccaaatt tgtcctgact gc 22 

<210> 125 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 125 

aattgcagga gtcgaagttt cc 22 
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<210> 126 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 126 

aacgagcaga caggaaatca tc 22 

<210> 127 
<211> 2 2 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 127 

tgtgacagca tgtttgaacg tc 22 

<210> 128 
<211> 22 
<212> DNA 

<213> Caenorhabdiris elegans 
<400> 128 

agttgtcaag aagtgcgtca ag 22 

<210> 129 
<211> 20 
<212 > DNA 

<213> Caenorhabditis elegans 
<400> 129 

gagatggctt gttggacgac 20 

<210> 130 
<211> 21 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 130 

gacaaaatca cgtcacgaag t 21 

<210> 131 

<211> 21 

<212> DNA 

<213> Caenorhabditis elegans 



<4O0> 131 

ttacttttct gggcagcaag c 
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<210> 132 
<211> 30 
<212> DNA 

<213> Caenorhabdi t is elegans 
<400> 132 

cgtgggtatt ccttgttcga agccagctac 30 

<210> 133 
<211> 23 
<212> DNA 

<213> Caenorhabdi tis elegans 
<400> 133 

tcaagtcaaa tggatgcttg aga 23 

<210> 134 
<211> 30 
<212> DNA 

<213> Caenorhabdi ti s elegans 
<400> 134 

tcacaagctg atcgactcga tgccacgtcg 30 



<210> 135 
<211> 25 
<212> DNA 

<213> Caenorhabdit is elegans 
<400> 135 

gattttgtga acactgtggt gaagt 25 

<210> 136 
<211> 22 
<212> DNA 

<213> Caenorhabdi tis elegans 
<400> 136 

ttattacatc cgtcactgcg tc 22 

<210> 137 
<211> 22 
<2 12> DNA 

<213> Caenorhabditis elegans 
<400> 137 

gcgtccttattcagaartccag 2 2 
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<210> 138 
<211> 22 
<212> DNA 

<213> Caenorhabdit is elegans 
<400> 138 

crtgtgactt caagcccact tc 22 

<210> 139 
<211> 22 
<212> DNA 

<213> Caenorhabdit is elegans 
<400> 139 

ggttatgaac cgattaggct cc 22 

<210> 140 
<211> 21 
<212> DNA 

<213> Caenorhabdit is elegans 
<400> 140 

gLdycctLuc yyyy Lctctetat c 21 

<210> 141 
<211> 21 
<212> DNA 

<213> Caenorhabdit is elegans 
<400> 141 

gatctcgcgc tatgttttga g 21 

<210> 142 
<211> 21 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 142 

gacagctgaa gctgaccaaa c 21 

<210> 143 
<211> 22 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 143 

caggagttaa acgtggtcao tg 22 
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<210> 144 
<211> 31 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 144 

cccaagcttg gtttaattac ccaagtttga g 31 

<210> 145 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 145 

gctctagata attcaatgaa aaggcaaaac gacg 34 

<210> 146 
<211> 18 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 146 

Lciy ciciyy ccio <=iyi-oyciyy IS 

<210> 147 
<211> 19 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 147 

taatacgact actataggg 19 

<210> 148 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 148 

cccaagcttc ttcatttggg cttcatttta ccac 34 

<210> 149 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 149 

gctctagaga aacaatctt: ttattcaaca tg 32 
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<210> 150 
OH> 34 
c212> DNA 

<213> Caenorhabditis elegans 
<400> 150 

ccgctcgagc tcgacgttct tcaatctata tttc 34 

<210> 151 
<211> 35 
<212> DNA 

<213> Caenorhabditis elegans 
< 4 0 0 > 151 

gctctagaca aacaccatta aatctgtatt taaac 35 

<210> 152 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 152 

ccgctcgagc tcgacgttct: tcaatctata tttc 34 

<210> 153 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 153 

gctctagagt tcacaaattc ataaacaaat acg 33 

<210> 154 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 154 

cccaagcttg gactttatca caatttccag cac 33 



<210> 155 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 155 

gctctagagt ttctagattt ttagatttcg tg 
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<210> 156 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 156 

ccgctcgaga taatgaagct tcttcttctc attg 34 

<210> 157 
<211> 32 
< 2 1 2 > DNA 

<213> Caenorhabditis elegans 
<400> 157 

gctctagagt ttctagattt ttagatttcg tg 32 

<210> 158 
<211> 85 
<212> PRT 

<213> Caenorhabditis elegans 
<40C> 158 

Met Ser Leu Hie Phe £er Thr lie Gin L w s Thr lie L°u Le 11 

15 10 15 

Phe Leu Leu Leu Val Thr Leu Ala Pro Arg Thr Ser Ala Ala Phe Pro 
20 25 30 

Phe Gin lie Cys Val Lys Lys Met Glu Lys Met Cys Arg lie He Asn 
35 40 45 

Pro Glu Gin Cys Ala Gin Val Asn Lys He Thr Glu He Gly Ala Leu 
5 0 5 5 6 0 

Thr Asp Cys Cys Thr Gly Leu Cys Ser Trp Glu Giu He Arg He Ser 
65 70 75 80 

Cys Cys Ser Val Leu 
85 



<210> 159 

<211> 81 

<212> PRT 

<213> Caenorhabditis elegans 

<400> 159 
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Met Leu Thr Kis Leu Lys Phe Leu Leu Leu Val Ser Leu Phe He Asn 
15 10 15 

Phe Ala Val Ser Ser Glu Asp He Lys Cys Asp Ala Lys Phe He Ser 
20 25 30 

Arg He Thr Lys Leu Cys He Kis Gly He Thr Glu Asp Lys Leu Val 
35 40 45 

Arg Leu Leu Thr Arg Cys Cys Thr Ser His Cys Ser Lys Ala His Leu 
50 55 60 

Lys Met Phe Cys Thr Leu Lys Pro His Glu Glu Glu Pro His His Glu 
65 7 0 75 80 

He 



<210> 160 
<211> 83 
<212> PRT 

<213> Caenorhabdi tis elegans 
<400> 160 

Met Lys Leu Leu Pro Leu He Val Val Phe Ala Leu Leu Ala Val He 
15 10 15 

Ser Glu Ser Tyr Ser Gly Asn Asp Phe Gin Pro Arg Asp Asn Lys His 
20 25 30 

His Ser Tyr Arg Ser Cys Gly Glu Ser Leu Ser Arg Arg Val Ala Phe 
35 40 45 

Leu Cys Asn Gly Gly Ala He Gin Thr Glu He Leu Arg Ala Leu Asp 
50 55 60 

Cys Cys Ser Thr Gly Cys Thr Asp Lys Gin He Phe Ser Trp Cys Asp 
65 70 75 80 

Phe Gin He 



<210> 161 
<211> 73 
<2 12> PRT 

<213> Caenorhabdi ti s elegans 
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<400> 161 

Met Lys Leu Leu His lie Phe He He Phe Leu Leu Phe Gin Ser Cys 
15 10 15 

Ser Asn Lys Met Cys Gin Tyr Ser Lys Lys Lys Tyr Lys He Cys Gly 
20 25 30 

Val Arg Ala Leu Lys His Met Lys Val Tyr Cys Thr Arg Gly Met Thr 
35 40 45 

Arg Asp Tyr Gly Lys Leu Leu Val Thr Cys Cys Ser Lys Gly Cys Asn 
50 55 60 

Ala He Asp He Gin Arg He Cys Leu 
65 70 



<210> 162 
<211> 258 
<112> DNA 

<213> Caenorhabditis elegans 
<400> 162 

atgtcactgc atttctccac tattcaaaaa acaattcttc taatctcatt cttgctcctc 60 
gtaacattgg ctcccagaac aagtgcagct tttccattcc aaatttgtgt caaaaaaatg 120 
gaaaaaatgt gcagaatcat caatccagag cagtgtgcac aagtaaataa aatcactgag 180 
attggagcat tgacagactg ttgcaccgga ctgtgctcct gggaagaaat ccggatctcc 240 
tgctgctccg ttttataa 258 

<210> 163 
<211> 246 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 163 

atgctcacac atctgaaatt cttgcttcta gtgagccttt ttatcaactt cgccgtaagc 60 
tctgaagaca tcaaatgcga tgcaaagttc atttcgagaa tcacgaaact ctgtattcac 120 
ggaattactg aagataaact tgttcgtctt ctcacaagat gctgcacatc tcactgctcc 180 
aaagctcatc tgaaaatgtt ctgcaccctg aaacctcacg aagaagaacc acatcacgaa 240 
atctaa 246 

<210> 164 
<211> 249 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 164 
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atgaagcttc ttctcattgt ggtttttgct cttttggcag tcatatcaga atcatattct 60 
ggaaatgact tccaacctcg tgacaataaa catcattcct atcgttcatg tggggaatcg 12C 
ttgagccgac gagttgcatt tctgtgtaat ggtggagcta ttcaaacaga aatactaaga 180 
gctcnggatt gttgttccac tggttgtacg gacaaacaga tcttttcttg gtgtgatttt 240 
caaatttga 249 

<210> 165 
<211> 222 
<212> DHA 

<213> Caenorhabditis elegans 
<400> 165 

atgaagcttt tacatatttt tattattctt ctgttattcc aarcgtgctc taataaaatg 60 
tgtcaatatt caaagaaaaa gtacaagatt tgtggagtta gagctattaa gcatatgaaa 120 
gtctattgta cacgtggaat gacaagagat tatggaaaat tactcgtgac ttgttgttcg 180 
aaaggatgta atgcaataga tatccaacgt atttgtttat ga 222 

<210> 166 
<211> 31 
<212> DUA 

<213> Caenorhabditis elegans 
<40 0> 166 

cccaagcttg gtttaattac ccaagtttga g 31 

<210> 167 
<211> 32 
<212> DMA 

<212> Caenorhabditis elegans 
<400> 167 

gctctagaca attttgatat taaattttgt eg 32 

<210> 168 
<211> 31 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 168 

cccaagcttg gtttaattac ccaagtttga g 31 

<210> 169 
<211> 33 
<212> DUA 

<213> Caenorhabditis elegans 
<400> 169 

gctctagatt aaattttgtc gattttcaag ttg 33 
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<210> 170 
<211> 15 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 170 

gttttcccagtcacg -5 

<210> 171 
<211> 17 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 171 

caggaaacag ctatgac 17 

<210> 172 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 172 

cccaayutLy ay cat L L Ly I LycLuLyCcta aatg 34 

<210> 173 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 173 

gctctagatt aaattttgtc gattttcaag ttg 33 

<210> 174 
<:il> 18 
<212> DNA 

<213 > Caenorhabditis elegans 
<400> 174 

tagaaggcac agtcgagg 18 

<210> 175 
<211> 19 
< 2 1 2 > DNA 

<213> Caenorhabditis elegans 
<400> 175 

taatacgact actataggg 19 
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<210> 176 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 176 

cgggatcccc gcacaaactt atatgacaac tc 32 

<210> 177 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 177 

cggaattcgg tgtctcataa tggtagtgga tac 33 

<210> 178 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 178 

cgggatcccc gcacaaactt atatgacaac tc 32 

<210> 179 
< 2 1 1 > 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 179 

cggaattcgc aaaagagagg tatagggata aag 33 

<210> 180 
<211> 18 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 180 

tagaaggcac agtcgagg 13 

<210> 181 
<211> 19 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 181 

taatacgact actataggg 19 
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<210> 182 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 132 

cccaagctta aaggcttaga tgcagaaaga cc 32 

<210> 183 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 183 

gctctagagg gattaaaatc actctgtgat taag 34 

<210> 184 
<111> 33 
<212> DNA 

<2 13 > Caenorhabditis elegans 
<400> 184 

cccaagctta aaggtggaca ttgtagaagg ttg 33 

<210> 185 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 185 

gctctagagg gattaaaatc actctgtgat taag 34 

<210> 186 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 186 

cccaagcrtc cttcacttct cagcgaagga aatg 34 

<210> 187 
<211> 33 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 187 

gctctagagt gctcatgctc cgttattttg tgc 33 



42 



WO 99/54436 



PCT/US99/08522 



<210> 188 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 188 

cggaattcct agaattttca ccccaaatgt tcag 34 

<210> 189 
< 2 11 > 3 3 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 189 

ccgctcgaga aatgtaagtg attggcaagt tgg 33 

<210> 190 
<211> 3 3 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 190 

cccaagctta gagacttaga cgcaaagagg acc 33 

<210> 191 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 191 

gctctagagc aggaaaatta gctaaaacat aatg 34 



<210> 192 

<211> 32 

<212> DNA 

<213> Caenorhabditis elegans 

<400> 192 

cggaattcgg cgaaacactt ccgccaactc ac 

<210> 193 

<211> 34 

<212> DNA 

<213> Caenorhabditis elegans 



32 



<400> 193 

ccgctcgaga cctaccgtca acttggagga caac 34 
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34 



<210> 194 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 194 

cccaagcttc cttgcacctg ccttcaacca tcac 

<21C> 195 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 195 

gctctagata ttctgacccc aaaatgacaa tc 32 

<210> 196 
<211> 34 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 196 

cccaagcttt tctgcagact tgcaaggtta gctc 34 

<210> 197 
<211> 32 
<212> DNA 

<213> Caenorhabditis elegans 
<400> 197 

gctctagaat tcacaaaata atcaagacaa tc 32 

<210> 198 
<211> 103 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 198 

Met Arg Ser Pro Thr Leu Phe Leu Leu Leu Leu Leu Val Pro Leu Ala 

15 10 15 

Leu Cys Kis Val Phe Ser Glu Pro Ala Asp Leu Glu Leu Lys Ser Tyr 
20 25 30 

Gin Ala Leu Glu Lys Ser Leu Lys Glu Met Gly Leu lie Arg Ala Asn 
35 40 45 
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Gin Gly Pro Gin Lys Ala Cys Gly Arg Ser Met Met Met Lys Val Gin 

50 55 60 

Lys Leu Cys Ala Gly Gly Cys Thr He Gin Asn Asp Asp Leu Thr He 
65 70 75 30 

Lys Ser Cys Ser Thr Gly Tyr Thr Asp Ala Gly ?he He Ser Ala Cys 
85 90 95 

Cys Pro Ser Gly ?he Val Phe 
100 



<210> 199 
< 2 1 1 > 72 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 199 

Met Leu Phe Lys He He He Leu Phe Phe Leu Leu Leu Gin Leu Ser 
15 10 15 

Glu Ala Lys Pro Glu Ala Gin Arg Arg Cys Gly Arg Tyr Leu He Arg 
20 25 30 

Phe Leu Gly Glu Leu Cys Asn Gly Pro Cys Ser Gly Val Ser Ser Val 
35 40 45 

Asp He Ala Thr He Ala Cys Ala Thr Ala Val Pro He Glu Asp Leu 
5C 55 60 

Lys Asn Met Cys Cys Pro Asn Leu 
65 70 



<210> 200 
<211> 110 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 200 

Met Arg Ala Leu Val Ala He Leu Cys Leu Met Ala Leu Cys His Ala 
15 10 15 

Ala Met Leu Asp Glu Leu Glu Met Gin Lys Glu Val Gin Glu Phe Kis 
20 25 30 

His Met Asn Gly Met Leu Gin Glu Phe Met Asn Lys Gly Leu lie Gly 
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35 40 45 

Asn His His His Gly Thr Lys Aia Gly Leu Thr Cys Gly Met: Asn lie 
50 55 60 

He Glu Arg Val Asp Lys Leu Cys Asn Gly Gin Cys Thr Arg Asn Tyr 
55 70 75 80 

Asp Ala Leu Val He Lys Ser Cys His Arg Gly Val Ser Asp Met Glu 
85 90 95 

Phe Met Val Aia Cys Cys Pro Thr Met Lys Leu Phe He His 
100 105 HO 



<210> 201 
<211> 67 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 201 

Met Met Arg Ser Phe Phe Val Leu Leu Ala Leu Leu Aia He Val Thr 
15 10 15 

Ser Thr Ala Ser Pro Thr Cys Gly Arg Ala Leu Leu His Arg He Gin 
20 25 30 

Ser Val Cys Gly Leu Cys Thr He Asp Ala His His Glu Leu He Ala 
35 40 45 

He Ala Cys Ser Arg Gly Leu Gly Asp Lys Glu He He Glu Met Cys 
50 55 60 

Cys Pro He 
65 



<210> 202 
<211> 76 
<212> PRT 

<213> Caenorhabditis elegans 
<:400> 202 

Met Phe Cys Lys Phe Val Phe Leu He Phe Leu Leu He Ser Leu Ser 
15 10 15 

Val Ala Thr Ala Asp Phe Gly Ala Gin Arg Arg Cys Gly Arg His Leu 

20 25 30 
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Val Asn Phe Leu Glu Gly Leu Cys Gly Gly Pro Cys Ser Glu Ala Pro 

35 40 45 

Thr Val Glu Leu Ala Ser Trp Ala Cys Ser Ser Ala Val Ser He Gin 

50 55 60 

Asp Leu Glu Lys Leu Cys Cys Pro Ser Asn Leu Ala 
65 70 75 



<210> 203 
<211> 120 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 203 

Met Ser Ser His Ala Leu Val Leu Phe Leu Leu Leu Phe Leu Leu Pro 
15 10 15 

Val Ala Leu Gly His Phe Leu Ser Lys Pro Ala Pro Asp Pro Arg He 
20 25 30 

Thr Phe Asn Arg Lys Leu Ala Glu Thr Leu Lys Glu Leu Gin Asp Met: 
35 40 45 

Gly Leu He Gin Ala Pro Arg Glu Pro Val Val Ala Ala Gin Gly Ala 
50 55 60 

Lys Lys Thr Cys Gly Arg Ser Leu Leu He Lys He Gin Gin Leu Cys 
65 70 75 80 

His Gly He Cys Thr Val His Ala Asp Asp Leu His Glu Thr Ala Cys 
85 90 95 

Met Lys Gly Leu Thr Asp Ser Gin Leu He Asn Ser Cys Cys Pro Pro 
100 105 HO 

He Pro Gin Thr Pro Phe Val Phe 
115 120 



<210> 204 

<211> 218 

<212> PRT 

<213> Caenorhabditis elegans 

<400> 204 
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Met Lys Met Pro Leu lie Leu Leu Leu Leu Val Ala Ala Ala Ser Ala 
15 10 15 

Phe Val His His Phe Asp His Ser Met Phe Ala Arg Pro Glu Lys Thr 
20 25 30 

Cys Gly Gly Leu Leu He Arg Arg Val Asp Arg He Cys Pro Asn Leu 
35 40 45 

Asn Tyr Thr Tyr Lys He Glu Trp Glu Leu Met Asp Asn Cys Cys Glu 
50 55 60 

Val Val Cys Glu Asp Gin Trp lie Lys Glu Thr Phe Cys Arg Ala Pro 
65 70 75 80 

Arg Phe Asn Phe Phe Gly Pro Ser Phe Lys Ala Leu Glu Arg Ser Cys 
85 90 95 

Gly Pro Lys Leu Phe Thr Arg Val Lys Thr Val Cys Gly Glu Asp He 
100 105 HO 

Asn Val Asp Asn Lys Val Lys He Ser Asp His Cys Cys Thr Pro Giu 
115 120 125 

Gly Gly Cys Thr Asp Asp Trp lie Lys Glu Asn Val Cys Lys Gin Thr 
130 135 140 

Arg Phe Asn Phe Phe Arg Gin Phe Leu Asp Ser Pro Gin Arg Ser Cys 
145 150 155 160 

Gly Pro Gin Leu Phe Lys Arg Val Asn Thr Leu Cys Asn Glu Asn He 
165 170 175 

Asn Val Glu Asn Asn Val Ser Val Ser Lys Ser Cys Cys Glu Ser Ala 
180 185 190 

Ala Gly Cys Thr Asp Asp Trp He Lys Lys Asn Val Cys Thr Gin His 
195 200 205 

Lys Pro Phe Val Phe Arg Pro Gly Phe Tyr 
210 215 



<210> 205 
<211> 107 
<212> PRT 

<213> Caenorhabdit i s elegant 
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<400> 205 

Met lie Phe Tyr Leu 
1 5 

Leu lie Leu Leu Leu 

20 

Ser Ser Glu Ser Tyr 
35 

Cys Gly Arg Arg Leu 
5 0 

lie Asp Pro Ala Asp 
65 

Cys lie Lys Gly Cys 
85 

Glu Glu Val Leu Asn 
100 



Thr Thr Tyr Leu Val Thr 
10 

Leu Val Ser Thr Thr Tyr 
25 

Glu Val Leu Met Leu Phe 
40 

Met Asn Arg lie Asn Arg 
55 

lie Asp Pro Lys lie Lys 
70 75 

Thr Asp Gly Trp lie Lys 
90 

Phe Gly Phe Phe Glu Asn 
105 



Met Ser Pro Leu Phe 
15 

Pro Tyr lie lie Asp 
30 

Gly Tyr Lys Arg Thr 
45 

Val Cys Val Lys Asp 
60 

Leu Ser Glu His Cys 
80 

Lys His He Cys Ser 
95 



< 2 1 0 > 206 
<211> 77 
<212> PRT 

<213> Caenorhabditis elegans 
<400> 206 

Met Gin Ser Leu Pro He Leu Ala Cys Leu Leu Thr Leu Ser Val Phe 
15 10 15 

Ala Pro Glu He His Gly Arg Glu Leu Lys Arg Cys Ser Val Lys Leu 
20 25 30 

Phe Asp He Leu Ser Val He Cys Gly Thr Glu Ser Asp Ala Glu He 
35 40 45 

Leu Gin Lys Val Ala Val Lys Cys Cys Gin Glu Gin Cys Gly Phe Glu 
50 55 60 

Glu Met Cys Gin His Ala Asn Leu Lys He Asp Lys He 
65 70 75 



<210> 207 
<211> 312 
<212> DNA 
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<2I3> Caenorhabditis elegans 



<400> 207 

atgagacctc ccaccttgtt tcttcttctg ctcctagtgc ccctggcact atgccatgtc 60 

ttctcggagc ccgcggattt ggagctcaaa agctaccaag cgcttgaaaa aagcctcaag 120 

gagatgggac tcattcgagc caaccaggga cctcaaaaag cgtgcggacg atcaatgatg 180 

acgaaggtgc agaagctttg cgcgggcgga tgcacaattc agaacgacga tcttaccatc 240 

aaatcctgca gtactgggta caccgatgcc ggcttcatct cggcctgctg cccatctggc 300 
ttcgttttct aa 312 



<210> 208 

<211> 216 

<212> DNA 

<213> Caenorhabditis elegans 



<4C0> 208 

atgttgttca aaatcatcat tttatttttc 
gcccagaggc gctgcggccg gtatttaatt 
tgctcaggag tttcaagcgt tgacattgcc 
gaagat ctga agaatatgtg ttgcccaaat 

<210> 209 

<211> 233 

<212> DNA 

<2 13 > Caenorhabditis elegans 



ctgctccagc tttctgaagc caaaccggaa 60 

cgttttttgg gggaactgtg taatggtccc 120 

acaattgcct gtgcaaccgc cgtcccaatc 180 
ttgtga 216 



<400> 209 

atgagagctc ^cgtcgctat tctctgcctt atggcactat gccatgcagc aatgctcgat 60 

gagctggaga tgcagaagga ggttcaggag ttccatcaca tgaacggcat gctccaagag 120 

ttcatgaata aggggctcat cgggaatcat caccatggta ccaaggccgg cctcacctgc 180 

gggatgaaca tcatcgagag agtcgacaag ctgtgcaatg ggcagtgcac tcggaactat 240 

gatgcactcg tcatcaagtc ctgccaccgc ggagtctcgg acatggagtt catggtggca 300 
tgctgcccaa ccatgaagct attcattcac taa 333 



<210> 210 

<211> 204 

<212> DNA 

<2 13 > Caenorhabditis elegans 



<400> 210 

atgatgcgct cattctttgt gctcttggct 
ccc acttgtg gcagggctct tctacaccgg 
gacgctcacc acgaactgat tgccattgcc 
attgaaatgt gctgtccaat ctaa 



ctgctcgcaa tagtcaccag caccgctagt 60 
atccagtcgg t ctgcggtcc ctgtaccatc 12 0 
tgctcaaggg gactgggcga taaggaaatc 180 

204 



< 2 1 0 > 211 
<211> 231 
<212> DNA 
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<213> Caenorhabditis elegans 



<400> 211 

atgttctgta aatttgtatt cctcatcttt 
gactttggcg cccagcgccg ttgtgggcgc 
ggtggcccgt gctctgaagc tccgactgtt 
gtctcaattc aggatctcga aaaattgtgc 



ctactcatct ctctgtcagt ggccaccgct 60 

cacttggtga acttcctcga gggactctgc 120 

gaactagctt cgtgggcatg ttcatcagca 180 

tgtccttcaa atcttgcttg a 231 



<210> 212 

<211> 363 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 212 

atgagttctc acgccccggt tcttttcctt ctccttttcc tcctaccagt ggcactgggc 60 

cacttcctct ccaagcctgc accggatcca aggatcacat tcaaccgtaa gcttgcggag 120 

acactcaagg agcttcagga catgggactc atccaggccc cccgtgagcc ggtagtggcg 180 

gctcagggag ccaagaagac ttgcggaagg agtttgttga taaagatcca acaactctgc 24 0 

catggaatct gcacagttca cgctgatgac c:ccacgaaa cggcatgcat gaaaggtctc 300 

accgactctc agctgatcaa ctcctgctgc ccaccaatcc cccagacacc attcgtcttc 360 
tga 363 



<210> 213 

<211> 657 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 213 

augaagatgc ccttgatctt gctgcttctc gtcgccgccg catcggcgtt cgtccaccac 60 

tttgaccatt caatgtttgc cagaccggag aaaacgtgtg gaggactact cattcgtcgt 120 

gtcgatagaa tttgcccgaa tctaaattac acatataaaa ttgactggga acttatggac 180 

aactgtrgcg aagtggtttg cgaggaccag tggattaagg aaaccttttg cagagcgccc 240 

aggttcaact ttrtcggacc ttcattcaaa gcccttgaaa gatcgtgtgg accaaaactg 300 

ttcacaaggg ttaaaactgt grgcggtgaa gacatcaatg ttgataataa agtcaagatt 360 

tcggatcact gctgcacacc agagggagga tgcacagacg actggatcaa ggagaacgtc 420 

tgcaaacaga ccagattcaa ctttttccga caatttctcg attcccctca aagatcatgt 480 

ggaccccagt tgttcaaaag agtgaatact ttgtgtaatg aaaatatcaa tgttgaaaat 540 

aatgtaagcg tgtcgaaaag ctgttgcgaa tcagcggcag gatgcacgga tgattggatt 600 

aagaagaatg tctgcacaca gcataagcct tttgttttcc gtccagcctt ttactga 657 



<210> 214 

<211> 324 

<212> DNA 

<213> Caenorhabditis elegans 



<400> 214 

atgattttct atctgacaac ctacctagta actatgtcac ctctcttcct gatcctgttg 60 
cttctagtct ctaccactta cccttacatc attgactcct cggagagtta tgaagttcta 120 
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atgctattcg ggtataagag aacatgtgga 
tgcgtgaagg atatagatcc agcagatatc 
tgtatcaagg gatgcacaga tggatggatc 
aattttggat tttttgaaaa ttga 



cgacgcttga tgaacaggat taatagagta 180 
gatccgaaga tcaaattatc ggagcactgt 240 
aagaagcata tttgcagtga ggaagttctg 300 

324 



<21C> 215 

<211> 234 

<212> DNA 

<213> Caenorhabdit is elegans 



<400> 215 

atgcaaagcc taccaattct tgcctgcctc 
catggccggg agctcaaacg ttgttctgtg 
ggaactgaaa gtgatgcaga aattctacaa 
tg-gggtttg aggaaatgtg ccagcatgcc 



ctcacactgt cagtttttgc gccggaaatt 60 

aaactttttg atattctaag cgtaatttgt 120 

aaagtcgcag tgaaatgctg ccaggagcag 180 

aacrtgaaaa tcgacaaaat ttaa 234 
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